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PROJECT SUMMARY 

The objective of this project is to investigate the potential use of 
logistic regression in rainfall estimation from satellite 
measurements. Satellite measurements provide covariate informations 
in terms of radiances from different remote sensors. The logistic 
regression technique can effectively accommodate many covariates and 
test their significance in the estimation. The outcome from the 
logistic model is the probability that the rainrate of a satellite 
pixel is above certain threshold. By varying the thresholds, a 
rainrate histogram can be obtained and from which the mean and 
variance estimated. 

A logistic model is developed and applied to rainfall data 
collected during GATE, using as covariates the fractional rain area 
and a radiance measurement which is deduced from a microwave 
temperature-rainrate relation. It is demonstrated that the fractional 
rain area is an important covariate in the model, consistent with the 
use of the so-called 'Area Time Integral' in estimating total rain 
volume in other studies. 

In order to calibrate the logistic model, simulated rain fields 
generated by rainfield models with prescribed parameters are needed. 
A strigent test of the logistic model is its ability in recovering the 
prescribed parameters of simulated rain fields. A rain field 
simulation model which preserves the fractional rain area and 
lognormality of rainrates as found in GATE is developed. The 
simulated rain fields are quite realistic. A stochastic regression 
model of branching and immigration whose solutions are lognormally 
distributed in some asymptotic limits has also been developed. This 
model makes no assumption about the law of proportionate effect which 
is often quoted to achieve lognormality. 

This study has demonstrated the effectiveness of the logistic 
technique in examining a large number of covariates and in testing 
their significance. By identifying important covariates and the way 
in which they enter the estimation procedure, this technique will be 
useful in the design of a system of remote sensors for the measurement 
of rainfall from space and in the development of satellite rainfall 
retrieval algorithms. 
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I. Objective 

The Earth distinguishes itself from other planets in the presence 
of water substances. The heat stored in various forms of water 
substances, the heat transported by atmospheric water vapor and by the 
oceans, the heat released during the transformations between the 
different phases have shaped Earth's climate to a large extent. Water 
vapor is the working substance of Earth's atmosphere: created to 

remove excess heat from the oceans and over land in the form of 
evaporation; participates in the radiative heating of the atmosphere 
by emission in the long wave regime of the atmospheric spectrum; 
transports excess heat in the tropics and deposits it in the high 
latitudes thus modulating the extreme heat and cold on Earth. In the 
final stage of this branch of the water cycle, it changes phase and is 
deposited in the form of precipitation over the Earth's surface. 

Because of the scale of variability, precipitation is probably 
one of the least known but yet most sensitive parameter in the water 
budget over land and oceans (Miller 1977, Laevastu, et al., 1969). A 
knowledge of the amount and distribution of precipitation is crucial 
to our understanding of the large scale dynamics of the oceans and 
atmosphere. Strong empirical as well as theoretical evidence have 
suggested that condensational heating of the tropical atmosphere, as 
indicated by the amount of precipitation, is instrumental to 
circulation anomalies world wide (Horel and Wallace 1981, Gill 1982). 

Precipitation and the antecedent latent heat release has been 
incorporated into General circulation Models (GCM's) of the Earth's 
atmosphere for some time, but the intensity and distribution is still 
poorly modeled. A detailed global rainfall data set is therefore 

needed to calibrate the GCM's for mean and anomalous conditions. To 

accomplish this, a satellite rainfall monitoring mission to measure 
precipitation over the tropics, the Tropical Rainfall Measuring 
Mission (TRMM) , has been proposed (Theon, et al., 1986). The 

objective is to obtain at least 3 years of monthly mean rainfall data 
over the tropical regions. 

To achieve this, a retrieval algorithm by which satellite 
measurements can be converted to rainfall data is needed. The 

ultimate objective of our work is to develop such an algorithm. The 
immediate objective is to investigate the potential use of logistic 
regression in rainfall estimation from space. Since rainfall is not 
directly measured, the information available are covariate information 
in terms of radiances from satellite sensors. The logistic regression 
technique is especially suited for this purpose since it can 
effectively accommodate a large number of covariates and readily test 
their significance. A secondary objective is to study the statistics 
of rain fields which will be useful in interpreting problems such as 
the "beam filling" and estimate biases are due to sampling. 
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In section 2, the techniques of estimating rainfall from space 
are briefly reviewed. The need for multispectral estimation 
techniques is stressed. Section 3 discusses the logistic model and 
demonstrates its use in identifying important covariates. The 
scenario of concommittant observations of microwave and fractional 
rain area data, which may be obtained from visible or infrared 
measurements, is investigated. A major finding is the importance of 
rain area in estimating total rainfall. Since observation of the 
fractional area is dependent on the foot print size of the 
observation, the statistics of rainfall fields are examined in section 
4 using the GATE data as an example. To calibrate the logistic 
technique, a simple rain field simulation model and a point process 
regression model which exhibit statistical properties of the GATE 
rainfall data are developed in section 4. The regression model is 
capable of producing rainfall rate with a lognormal distributions in 
some asymptotic limits. These limiting conditions are satisfied in 
the GATE data for large area averaged conditions. The dependence of 
statistical parameters in rain fields on scale is addressed in section 
5. Section 6 summarizes our findings and makes recommendations for 
future work. 

2. Review of Satellite Estimation Techniques 

The need for satellite monitoring of global rainfall has been 
stressed by Atlas and Thiele (1982) and Austin and Geotis (1980). 
Barrett and Martin (1981) have reviewed the various estimation 
techniques. Another good source of reference is contained in the 
preprint volume of the second conference of satellite meteorology in 
which two sessions are devoted to the estimation of rainfall from 
space. 

The source of satellite data is basically derived from three 
regions in the atmospheric spectrum: the visible (VIS), infrared 

(IR), and microwave windows. The techniques which use information in 
the visible part of the spectrum rely on identifying cloud types and 
assigning rainrates to them. This cloud type-rain rate relation is 
dependent on the local climatology, and hence, this method must be 
calibrated regionally. 

The infrared techniques rely on information on cloud top 

temperatures which are indicators of cloud heights. The implicit 
assumption is that the rain-bearing clouds are tall cummulus clouds. 
Arkin (1979) developed an index of precipitation which is the number 
of pixels within an area in an IR satellite imagery with temperatures 
below 235 degrees Kelvin. This index represents the fractional area 
of high convective clouds within the area. When compared with 
rainfall data measured during GATE, a correlation coefficient of 0.87 
is obtained. Arkin' s index of precipitation has been adopted for 
local calibration of rainfall during the Tropical Ocean Global 
Atmosphere (TOGA) experiment. However, at middle to high latitudes, 
rainfall from large-scale low-level stratiform clouds becomes 

increasingly dominant, and this cloud area index becomes less 
effective in estimating rainfall in those regions. 



5 


A more direct approach relies on the radiative properties of rain 
drops in the microwave portion of the spectrum. By modeling the 
vertical structure of a rain cloud, a rainfall rate-microwave 
temperature relation can be established. Hence, a rainfall rate can 
be estimated from an observation of the microwave emission. There are 
several pitfalls in this approach. 

1. Unfilled Field of View (FOV) — Microwave measurements usually 
have large foot print sizes, and, hence, the field of view of 
the foot print is usually not filled with rain. A bias is 
introduced if the measurements from the unfilled beam is used 
to retrieve rainfall through the microwave temperature- 
rainfall rate relation. 

2. Saturation — The microwave measurements become saturated at 
high rainrates. At 19 GHz, the beam becomes saturated at 
rainrates above 15-20 mm/hr. Although only a small fraction 
of the measurements are contained in this portion of the the 
rain spectrum, the high rainrates account for a large 
fraction of the total rainfall. 

3. Rainfall Rate-Microwave Temperature Relation — In deriving the 

rainfall rate-microwave temperature relation, a cloud model 
has to be assumed. Such a relationship is rather sensitive 
to the assumed parameters, such as profile of ice and liquid 
water content. Rather different relationships are found for 
different modeling assumptions. For example, the 

relationship presented by wilheit, et al. (1977), showed a 
monotonic increase of microwave temperature as a function of 
rainfall rates in the range from 0 to about 15 mm/hr at 19 
GHz whereas that of Wu and Weinman (1984) shows a decrease. 

Estimation schemes which combine information form the different 
atmospheric channels, seems to yield good estimates. Love joy and 
Austin (1979) developed an algorithm which delineates rain areas from 
visible and infrared measurements. Radar detected rain patterns are 
used as ground truth and a statistical pattern recognition technique 
is used to establish rain area characteristics in the visible and 
infrared. Once the rain areas are calculated, the rainfall is 
obtained by multiplying the area by a climatological mean rainfall 
rate. This multi-spectral approach has had many successful 

applications and has been adopted for operational satellite rain 
estimation by the Atmospheric and Environmental Service of Canada. 

It is argued that if information from different channels are 
combined, a better estimation scheme can be developed. Since the 
resolutions of the sensors are quite different, it is necessary to 
identify the important covariates as well as the way through which 
they enter the estimation scheme. In what follows, a logistic model 
is described and the scenario of concommitant microwave IR/viS 
observations which delineate rain area is examined. 
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3. The Logistic Model 

The logistic model is useful in determining the relationship 
between the distribution of a random variable and a set of covariates. 
It has been applied in various forms in reliability testing and the 
analysis of survival data (Cox and Oakes 1984). Detail treatment of 
the logistic model is given by Cox (1970). The model is briefly 
described below. 

Let R be the random variable which stands for rainrate and let 

Z = (Zq_, . . Zp ) 

be the vector of covariables related to R. Suppose we are interested 
in estimating the probability 

P(Rel) 

where I is a rainrate interval. Let X be defined by 
1, r in I 

X = { 

0, Otherwise 


Then 


P(ReI) - P(X = 1) . 

In many respects the simplest way to express the dependence of this 
probability on explanatory variables or covariates is to postulate the 
model [Cox (1970)]. 

1 

P(X = 1) = 

1 + e" + . . . + PpZp) 


1 

P(X = 0) = 

1 + gOo + Pi^i + . . . + PpZp) 

This is the logistic model. This model allows great flexibility in 
the choice of the covariates and in mathematical manipulations. 

The parameters are estimated by maximizing a likelihood function 
and the significance of the covariates are readily tested by a 
likelihood ratio. The interested reader is referred to our paper 
(Chiu and Kedem 1986) for a more detailed discussion. This paper is 
attached (attachment A) with this report. 
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The scenario of a TRMM-like system of sensors which can provide 
microwave measurements and fractional rain area within a microwave 
foot print size pixel is examined. The data we use are the rainfall 
data collected during GATE. The GATE data are binned at 4 kms by 4 
kms and are given at 15 minute intervals. A detail description of the 
data is given in the next section. The microwave temperature is 
mimicked through a microwave temperature-rainfall rate relation (see 
attachment A) . The microwave measurements are assumed to have a 
resolution of about 32 kms on the side, somewhat similar to the 
resolution of the Electrically Scanning Microwave Radiometer (ESMR) 
which was flown on board the Nimbus V satellite. From the 4 kms by 4 
kms rainfall rates, a temperature is computed. The temperatures of 64 
(32/4 or 8 pixels on the side) neighboring pixels are averaged to 
obtain the microwave temperature (T). The fractional rain area with 
ra inrates above 1 mm/hr (F) is obtained by counting the number of high 
resolution pixels (4 kms on the side) with rainrates above 1 mm/hr and 
dividing by the total number (64) in a large microwave pixel (32 kms 
on the side). Another index, FI, which is the fractional area with 
rainrate in excess of 20 mm/hr, is also used. This index mimicks 
Arkin's index of high clouds which produce heavy rainfall. To test 
the usefulness of the logistic technique, another parameter, TL, is 
also included in the estimation. TL is the microwave temperature T at 
a lag of i time units (15 minutes). The results are summarized in 
table 2 in Chiu and Kedem ( 1986 ) (attachment A). The results show that 
the inclusion of TL does not improve the model significantly. This is 
probably due to persistence in the time series so that there is not 
much new information in TL as most of it is contained in T. The 
results also show that T is the best regressor in the model. Since T 
is derived from the rain field, this result cannot be taken literally. 
An interesting finding is the importance of F in the model. This is a 
better regressor than FI, but, when the two parameters F and Fl are 
combined, a better model is obtained. This is consistent with our 
finding about the contribution of the rain area in determining the 
total rainfall, a point which we shall return to in section 4. 

4. GATE Rai nfall Statistics 

The fractional area of rain within a pixel is dependent on the 
pixel size and the spatial variability of the rain field. Hence, the 
structure and statistical properties of the rain field need to be 
studied . 

4.1 The Data 

The study of the statistical properties of the rain field is 
based on data collected during GATS. This is one of the most 
comprehensive rain measurements made ove the ocean. 
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1. GATE Surface Rainfall Data — The GATE is an observational 

program conducted in the summer of 1974. During three 

roughly tri-weekly periods, each termed a phase, detailed 
rainfall measurements from rain gauges and radars on an array 
of research vessels were made over an area called the 
B-scale. The center of the B-scale area is located at 8.5N, 
23. 5E and encompasses an area of about 200 km in diameter. 
Arkell and Hudlow (1977) composited the radar measurements 
from ships and presented an atlas of the radar echoes at 15 
minute intervals. Patterson, et al. (1979), converted the 
radar measurements to rainrates and presented rainrate data 
in 4 by 4 km^ bins. 

2. CAPPI — For the height of the rain column, we used the 
Constant Altitude Plan-Position Indicator (CAPPI) radar data 
taken onboard the research vessel the "Oceanographer," which 
was positioned at the center of the B-scale area in GATE, but 
was moved to the Southeast quadrant. The original data was 
taken from the plane position indicator (PPI) for elevation 
angle of about 1.5 to 22 degrees. Ptylowany, et al. (1979), 
converted the data from elevation-distance coordinate to 
constant altitude plane position co-ordinate, with a vertical 
resolution of about 1 km. The maximum echo height reported 
is 12 kms, i.e., at higher heights are truncated at 12 kms. 
This data covers 3 convectively active days in each phase of 
GATE. 

4.2 The Mixed Distribution Model 

An objective of TRMM is to obtain monthly averages of rainfall. 
If rainfall rates can be described by a class of statistical 
distribution, the estimation procedure can be simplified since only a 
few parameters of the distribution need to be estimated. We examined 
the GATE data and found that the rainrates can be described by a mixed 
distribution (attachment B). The mixed distribution consists of a 
finite probability of no rain and a continuous distribution for the 
rainy part. Conditional on rain, it was shown that the lognormal 
distribution provides an excellent fit to the data. A detailed 
description of the model and its application to sampling studies in 
GATE can be found in attachment B of this report. 

4.3 Intermittency 

Intermittency refers to sporadic changes in a field of 
turbulence. It expresses the fact that turbulence does not fill the 
whole space in a turbulent flow. This is an important aspect of 
turbulent flows, which despite much work, is far from being completely 
understood (Schertzer and Lovejoy 1985). A measure of intermittency 
is the fraction of time in which an event occurs over a priod of 
distance (Tennekes and Lumley 1974). For extreme events, we expect 
this measure of intermittency to increase as turbulence sets in 
through flow instability, reaches some peak value and then decreases 
as the energy of the turbulent flow is cascaded to smaller scales 
through dissipative losses. 
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In a rough sense, we can consider the rain fields as fields of 
turbulence. Precipitation can be considered an "extreme" event, an 
index of moist instability. The fraction of time/space that this 
event occurs is a measure of intermittency. 

An important parameter in the estimation of total rainfall from a 
GATE scene is the fractional rain area (Chiu, et al., 1986). Figure 1 
shows scatter diagrams of the average rainfall rate for a GATE scan 
and the fractional rain area with a rainfall rate of 1 mm/hr and above 
on logarithm scales. The correlations between the two variables are 
extremely high for both phases of GATE. It is interesting to note 
that this correlation of 0.99 is higher than the correlation of 0.87 
between rainrate and Arkin's cloud index. Since the rainfall total 
(R) for a GATE scene is the product of the fractional area (p) 
multiplied by the average rainrate for the rainy pixels (a), or R = 
pa, we can take the logarithm of both sides and compute the variance 
of log R as a sum of the variance of log p and log a. The 
contributions from the various terms are given below for GATE 1 and 2. 

var(log R) » var(log p) + var(log a) + 2 cov(log p log a) 

(100%) (77%) (3%) (20%) GATEl 

(100%) (77%) (3%) (20%) GATE2 

We pointed out that this index of fractional rain area is 
equivalent to the so called "Area Time Integral (ATI)" used in radar 
meteorology to estimate rain volume. The ATI is the time integral of 
the area of radar echoes. It is shown that the total rain volume of a 
system can be obtained by multiplying the ATI by some climatological 
mean rainfall rate (Doneaud, et al. 1981). Jackson (1986) examined 
the contribution of the number of rain days in a month and the average 
intensity of rainfall during raindays in tropical stations to the 
monthly rainfall. It is found that the number of raindays is the 
dominant factor in determining the monthly total. These are 
consistent with our results on the analysis of GATE data and the 
logistic model. 

4.4 Spatial and Temporal Rain StrucJiure 

Because of the phenomena of intermittency in rain fields, it is 
difficult to define the usual characteristic functions of a turbulent 
field such as correlation or autocorrelation functions. For example, 
the autocorrelation functions will have a long tail at long 
separations due to the abundance of no rain observations. 

we have examined the structure of the rain field in terms of 
conditional probabilities. 

Figure 2 shows the probability of observing a rainrate of 1 mm/hr 
at a fixed location (4 km by 4 km pixel) in GATE at different time 
lags conditional on observing such an event at time zero. It can be 
seen that the conditional probability drops off rather rapidly but 
reaches another secondary maximum in about 10-12 hours. The condition 
for independence is derived in the appendix and is plotted on the same 
graph. This assumed no sampling error or persistence in the data. 
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Figure 3A. Lines of constant probability of observing R>lmm/hr 

consitional on observing R>lmm/hr at a distance for GATE 1. 

The distance between two marks on the boundaries are 12 kms apart. 
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A similar calculation is performed on the conditional probability 
on space. Figure 3 shows lines of constant probability as a function 
of distance conditional on the event of having 1 mm/hr at a 4 km by 4 
km pixel for GATE 1 and 2. The anisotropy in space is clearly 
discernible for GATE 2. The more less east-west orientations of lines 
of constant probability is consistent with the meteorological 
conditions in GATE of the passage of elongated rain bands oriented in 
the east-west direction. 

4.5 Cloud Height 

In the retrieval of rainfall rate from microwave temperature 
measurements, a number of parameters enters into the retrieval. Since 
these parameters are quite variable, errors are introduced into the 
estimation scheme if some constant value is used. An important 
parameter is the height of the rain column. The bias due to the rain 
cloud height can be estimated as follows. The attenuation of 
microwave radiation (or change in the optical thickness t) in the 
presence of rain can be written as 


At = ahR*^ (1) 

where a and b are functions of frequency, drop size distribution and 
temperature of the drops. Olsen, et al. (1978), have examined the 
dependence of a and b over a broad range of frequencies and for 
different drop size distribution at various temperatures. At a 
frequency of about 20 GHz and 0 degrees Celsius, 

.05 < a < 0.09 


and 


0.9 < b < 1.1 

with R the rainrate, in mm/hr, h, the effective height of the rain 
column, in km. Oftentimes h is defined in terms of the attenuation as 

At/aR*^ 

We would like to get some idea of the distribution of the height of 
rain columns and an estimate can then be made of the bias in using a 
climatological height in the estimation from microwave sensors. 

From equation (1) above (for simplicity, assume b = 1 ) , an 
estimate of the rainrate using a climatological cloud height, <h>, 
where < > denote ensemble averaged quantities, is 
R(<h>) i (At/a)l/<h> 

The bias in percent can now be written as 
B = (<R> - R(<h>) = <R>/R(<h>) - 1 
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where 


<R> 


r 


Ax/a 


1/h p(h)dh 


the factor At/a cancels out, and 
<R>/R(<h>) « <h> <l/h> 


To calculate these quantities, the distribution of h P(h) is 
needed. The data we used to calculate P(h) are the so called "CAPPI" 
(constant Altitude Plane Position Indicator) data of GATE (pytlowany, 
et al. , 1978). Figure 4 shows the histrograms of height obtained form 
3 days of data in each phase of GATE. We have taken individual pixels 
in calculating the statistics as opposed to earlier works which counts 
a rain cloud as an entity (e.g., Houze and Cheng, 1979). Our emphasis 
here is the estimation and correction of the bias associated with 
satellite retrieval. Because of the noise in the radar reflectivity, 
we have set a low threshold of 24 dbz corresponding to a rainrate of 1 
mm/hr. The histograms show bimodal distributions in GATE 2 and 3, 
with peaks at 5 and 8 kms respectively whereas this feature is absent 
in GATE 1. The double peaks are also present if the statistics is 
calculated over cloud clusters (Houze and Cheng 1979). 

We noted that the bias is extremely sensitive to the population 
at the low cloud heights. If the threshold value is changed to the 
lowest detectable level, the whole historgram rises over all ranges in 
height. The increase in population at the low height will increase 
the bias substantially (from about 25 to 50 percent). 

Another point is that R and h are related: one expects a higher 
rainrate associated with higher cloud top. Adler and Mack (1984) have 
examined the usefulness • of this relation and other environmental 
information to estimate rainfall. Figure 5 shows a two dimensional 
distribution of distribution of h and radar reflectivity for the same 
GATE CAPPI data. The shape of the loci of the maxima of the 
distributions agree well with the rainrate — cloud height relation 
observed in tropical storms (Adler and Mack 1984, their Figure 1). 

5. Rain Field Models 


5.1 Simulation Model 


To extend the data base beyond the scope of GATE for the purposes 
of sampling studies and the calibration of the logistic model, a 
simulation model of rain field is developed which preserves the 
characteristics of GATE rainfall: namely, fractional rain area and 
lognormality of the rainy part of the distribution. A description of 
the model is given in attachment C. This model is capable of 
producing realistic rain fields. 

Laughlin (1982) examined the errors associated with satellite 
sampling and computed the temporal autocorrelation function for 
different area averages for GATE. From the autocorrelation 
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functions, sampling requirements for different area averages are 
calculated. The temporal autocorrelation function for different areal 
averages in our model is also calculated. Our results are similar to 
those computed by Laughlin (1982) (See attachment C.). 

5.2 Stochastic Regression Model 

A regression model of replacement and immigration is also 
developed (Kedem and Chiu 1986 ) (attachment D). In this model, the 
number of raindrops within a rain volume is considered a random 
variable which can be changed by replacement and/or immigration. 

The model takes the form 

Xn-1 

^n “ ^n ,i ^n' = 1/ 2, ... 

i=l 

where Xn is the number of drops at the nth step which can be replaced 
by y fresh drops, and I denotes the number of immigrants entering the 
rain volume. It can be shown that if 

E(^n,i) is small but greater than zero; and 

E(In) is close to but less than unity 

then Xn follows a lognormal distribution. This provides a 
justification for the use of the lognormal distribution in fitting 
rainfall data. It also bypasses the use of the law of proportionate 
effect often quoted to achieve lognormality . when the model 

parameters are estimated from the GATE data, it was found that these 
conditions are satisfied for large area averages. Since the sampling 
frequency is 15 minutes during GATE, this result suggests that there 
is a spatial and temporal range in which the lognormal distribution 
can provide a good description of the rainfall rates. The range over 
which the lognormal distribution provides a good fit to the data is 
investigated in the following section. 

6. S.cale Dependence Of Rain Field Parameters 

The three parameters of a mixed lognormal distribution that 
describe a rainfall distribution are dependent on the scale of 
averaging. The threshold that define extreme events is (in this case 
precipitation), therefore, also dependent on the averaging time/area. 
An obvious question then is over what range in time and space does 
lognormality provide a good description of rainrate distributions. 
As a practical concern, it is of interest to examine the dependence of 
the intermittency factor on the pixel size which is determined by the 
resolution of satellite sensors and the altitude of the orbits. 
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We have examined the GATE data for different area averages. The 
three parameters, p, a, the mixed distribution model of the GATE 

rain field have been computed for different averaging areas in the 
range from 4 kms to about 350 kms (whole of GATE B-scale area) on the 
side. Figure 6 shows the results on a log-log scale. The linear 
relation between the log of the parameters and the square root of the 
averaging area is clearly discernible at least over the range from 
areas of 4 kms to 80 kms on the side. The linear dependence suggests 
a power law dependence of the parameters on the averaging area for 
sampling frequency of 15 minutes. 

Figure 7 shows the histograms of rainrates for square pixels of 
4, 40, 80, and about 350 Jots on the side. The histograms are 

calculated on a logarithm scale. The logarithm scale is used because 
a lognormal distribution on a linear scale is a normal distribution on 
log scale. Another advantage of using the logarithm scale is that the 
no rain category appears at minus infinity. Hence a threshold for the 
occurence of events can be defined with no ambiguity. 

The general shift from the high values towards the low values are 
noted as the resolution decreases, the skewness in the curve is also 
increased accordingly, the spatial averaging process smoothes out the 
high rainrates and inflates the population at the low rainrate 
portion. These shifts occur when nonrainy pixels are averaged with 
rainy pixels. 

7. Summary And Estimate of Technical Feasibility 

A logistic regression model has been developed to estimate the 

probability of rainfall given covariate observations such as 

radiometric measurements. The parameters of the model are estimated 
by maximizing a likelihood function. The significance of the 
estimators of the model can be readily tested by a ratio of the 
likelihoods. This method of testing allowed identification of 
important covariates as well as the way in which the covariates enter 
into the estimation. The logistic model has been tested on the 
rainfall data collected during phase l of GATE and successfully 
predict the observation for phase 2 of GATE. A major finding is the 
usefulness of the fractional rain area within a pixel. This parameter 
gives a better regression model than that which uses only the 

fractional area of heavy precipitation. The index of heavy 
precipitation area is interpreted as the cloud index of Arkin in 
estimating rainfall through the use of infrared measurements. 

To investigate further this relation, a correlation analysis was 
performed on the logarithm of GATE rainfall data and the logarithm of 
the fractional rain area. Correlation coefficients of 0.99 are 
obtained for both phases of GATE. These coefficients are larger than 
the value of 0.87 between the cloud index proposed by Arkin and the 
total rain volume. 

To estimate the mean and variance of areal average rainfall, a 
mixed distribution model was proposed and was found to model the 
distribution of rainfall data in GATE quite well. The parameters of 
the mixed distribution model consists of two parts; a discrete 
probability of no rain and a continuous distribuiton which describes 
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the rainy part of the mixed distribution. It was found that the rainy 
part of the distribution is fairly well described by a lognormal 
distribution. The discrete part of the mixed distribution is 
interpreted as a measure of intermittency which found familiarity in 
the study of turbulent flows. 

Because of the nature of intermittency, we propose the use of the 
conditional probability in describing the rain field. The probability 
of rain conditional on rain at a different time/space for the GATE 
period is computed. The anisotropy in space is clearly discernible 
for GATE 2. 

To broaden the data base for the testing of the logistic model, 
a data set of three dimensional rain cloud structure derived from 
radar echoes during GATE is used to compile a data set of cloud height 
and surface rainfall. The conditional probability distribution of 
cloud height and surface rainfall is calculated. The relationship 
between surface rainfall and cloud height is consistent with earlier 
results on tropical cloud systems. 

A model is developed to simulate rain fields observed in GATE. 

The simulation model preserves the lognormality and intermittency 
characteristics of GATE and the temporal autocorrelation function 
computed from ra infields generated by the model is very similar to 
that of Laughlin (1982) in estimating the sampling errors associated 
with satellite observations. 

A regression model of replacement and immigration is also 
developed which is capable of producing a lognormal distribution in 
some asymptotic limits. These asymptotic conditions are observed in 
GATE for large area averages (40 kms on the side) but not for small 
area averages (4 kms on the side). 

Since the GATE data is taken every 15 minutes, this suggest that 
the lognormal distirubiton is a valid approximation within some range 
of averaging in time and space. This range of validity is 
investigated by computing the parameters of the mixed lognormal 
distribution model for different area averages in GATE. It was found 
that these parameters varies as a power of the averaging area at least 
over the range from areas of 4 to 80 kms on the side. 

We have demonstrated the feasibility of using the logistic 
regression in identifying important covariates in the estimation of 
rainfall. A logical next step is to refine the logistic technique by 
the method of partial likelihood (Cox 1975). This method allows the 
disposition of the assumption of independence of the estimators. To 
examine the contribution of the radiometric data from the different 
atmospheric channels, we need to put together a data of concurrent 
visible, inf rare and microwave data. Model generated rain fields are 
also needed to calibrate the logistic technique. Rainfall statistics 
derived from analyses of the rainfall data sets will proved to be 
useful in providing the required constraints for these simulation 
models . 


1 '. 



APPENDIX: Criterion for Independence 

of Conditional Probabilities 


We want to compute the lag time between observations such that 
the observed events becomes statistically independent. Assuming 
stationarity , the condition for statistical independence can be 
obtained as follows. Let A(B) be the event that the ra inrate (R) at 
time t(t - t) in a fixed location be greater than some prescribed 
value, say Rq, i.e., 

A: R(t) > Rq 

B: R(t - t) > Rq 

The probability of A conditional on B can be written as 
P(A|B) - P(A n B)/P(B) 

if A and B becomes statistically independent, then 
P(A n B) = P(A) P(B) 

so the condition for statistical independence is 
P(A|b) = P(A) 

where P(A) is the probability of rainrate greater than Rq. 
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1 . INTRODUCTION 

The retrieval of meteorological quanti- 
ties from satellite observations is based on 
covariate information such as radiometric 
measurements or physical quantities derived from 
them. The covariate information is influenced 
by factors other than the desired meteorological 
variable. The situation is further complicated 
by the different resolutions of the different 
sensors. It is useful to identify important 
covariates for the prioritization of transmis- 
sion of data and to ascertain the possibility 
of on board processing. 

Accurate measurement of tropical rain- 
fall is crucial for the advancement of our 
understanding of the large-scale dynamics of the 
ocean/atmosphere system. An account of rain- 
fall monitoring techniques from satellites is 
given by Barrett and Martin (1981). A satellite 
mission for the monitoring of tropical rainfall 
has been proposed to NASA (Theon et al 1986) . 
Three instruments are proposed fo~ tlie mission: 
a radar, an Advanced Very High Resolution 
Radiometer (AVHRR) and a microwave instrument, 
possibly an Electrically Scanning Microwave 
Radiometer (ESMR) . The expected outcome from 
this mission is at least three years of rainfall 
data derived from concommitant covariate obser- 
vations. 

In the following a logistic model that 
can effectively accommodate covariate information, 
but which has not been used In the context of 
rainfall estimation, is described. A major 
difference between linear regression and logistic 
regression is that the former technique maxi- 
mizes the variance explained while in logistic 
regression a likelihood function, or probability 
of an event, is maximized. The output from such 
a model is the distribution of rainrate cate- 
gories from which standard errors can be esti- 
mated. The significance of the covariates can 
be tested rather readily. An example of the 
logistic model is given for the scenario of the 
proposed tropical rainfall monitoring mission 
from which microwave observations and fractional 
rain area measurements may be available. 

2. THE LOGISTIC MODEL 


of a random variable and a set of covariates. 

It has been applied in various forms in reliabil- 
ity testing and the analysis of survival data 
(Cox and Oakes, 1984). A detailed treatment of 
the logistic model is given by Cox (1970). We 
are interested in the relationship between rain- 
fall rate averaged over an area R and the 
vector of covariate variables related to R. 

For the event R ^ Rq» the logistic model is given 
by 

P(!^Ito) - [1 + exp ( -b't )r^ 

where P(IbRo) is the probability that the rain- 
fall rate R exceeds Rq and 

^ * ( bQ, bj, b2» . . . , bj^ ) 

is a vector of constants. From n observations of 
R, the b’s can be estimated by the method of 
maximum likelihood. Let Rq be fixed so that the 
a binary variable Y can be defined as 

1 R>Rq 

Y - 

0 otherwise 

The logistic model becomes 

P(Y - 1) - (1 + exp(-(bo + bjt, + ...+ bj^t, .))]"' 

where the t’s are covariate variables. We assume 
Yp Y2» ...* Yjj are conditionally Independent 
given the covariate information. Then the 
likelihood function L(^) is given by 

n Yi 

L(b) - IT (exp(£ *y] / ( I + exp(t/b)) 

i-1 ^ ^ 

and the asymptotic covariance matrix is given by 

(-E ( 3^ log L(b)/ 3b^ 3bj ) )"^ 

where E is the expected value. To test the sig- 
nificance of the regression coefficients, we use 
the likelihood ratio test 

A - -2 log Lq/Lj 

where L. is the maximized likelihood under the 
full model and L^ is the maximized likelihood 
under the hypothesis that the some of the 
regression coefficients are zero. If q of the b*s 
are assumed to vanish, then X follows asymptotic- 
ally a chi-square distribution with q degrees of 
freedom.* 


The logistic model is useful in deter- 
mining the relationship between the distribution 
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We consider the scenario when concorami- 
tant observations of microwave temperature and 
fractional rain area are available. The rainfall 
data collected during the phase X of GATE are 
used. The basic d|ta are radar-estimated rain- 
rates on 4 by 4 pixels and measurements are 
made at 15 minute intervals. From the basic 
data» temperature and fractional rain area data 
for the scenario are generated as follows. 

We assume that the microwave instrument 
measures the temperature over an area of 32 by 
32 km"^ (i.e. 8 by 8 pixels) which is the unit 
area for our scenario. To calculate a mean 
temperature over the 32 by 32 km box, a simple 
relation between the ra inrate (r) and tempera- 
ture (Tj^) 

Tp(r) » T (1-x) + T e X + (1- t )T (l-x)x (1) 
R av s av 

where T is the average temperature of the 
atmospheric column (*270K) , T is the surface 
temperature (»290K), c is surface emissivity 
(*0.5), x»exp(- i)is optical thickness, is used. 

T is approximated as t * 0.2r, with r in tnm/hr. 
The dependence of x on the height of the rain 
column is ignored in this case. A functional 
relation between R and T-. is shown in fig. 1. 

From the rainrate at each pixel, a microwave 
temperature is computed. The microwave tempera- 
tures are then averaged over the 32 by 32 km box 
to yield the average temperature (T) . The 
fractional rain area (F) is obtained by dividing 
the number of pixels with rainrate in excess of 
1 mm/hr by 64, The box averaged rainrate (R) is 
obtained by averaging the rainrates over the 
64 pixels. Fig. 1 shows the scattered diagrams 
of R versus T. The fact that Tj^(R) is greater 
than T follows from Jenssen’s Inequality 
(Feller, 1966). Fig. 2 shows the relationship 
between F and R. The strong correlation 
between R and F is also noted by Lovejoy (1980) 
for the phase III of GATE for the whole GATE 
area. Also included in our analysis are 
fractional rain area with rainrates in excess of 
20 ram/hr (F^)* Thp data have been extracted 
from a 32 by 32 km grid box in the center of 
the GATE area. Characteristics of the time 
series are summarized in table 1. Data from 
another box located approximately 100 km to the 
south of the first is used for validation. 

Table 1. Characteristics of the time series 


Variable 

mean 

s.d. 

minimum 

maxi mu n 

R (mm/hr) 

.44 

1.48 

0.0 

17.5 

T (K) 151.5 

16.67 

145.0 

263.1 

F 

CsJ 

o 

• 

.17 

0.0 

1.0 

Fl 

.006 

.03 

0.0 

.47 



Fig. 1. Scatter diagram or rainfall rate over 
the box (R) and microwave temperature (T) . The 
dashed curve shows that functional relationship 
between Tj^ and R. 



Fig. 2. Scatter diagram of rainfall rate (R) 
and fractional rain area with rainrates in excess 
of 1 mm/hr (F) . 

4 . RESULTS 

The full logistic model is of the form 

P(R>l)-[ l+exp-Cbo+bjF+bjFj+bjT+b^Tj^) 

where R, F, F^ and T are rainrate, fractional 
area with rainrate in excess of 1 mm/hr, frac- 
tional area with rainrate in excess of 20 mm/hr 
and the temperature over the 32 by 32 km^ box. 

is T lagged at 1 time unit (i.e. 15 minutes). 

A total of 10 different models have been run and 
the regression coefficients are presented in 
table 2. The maximum log likelihood ranges from 
-309.6 for a model with F, as the only regressor 
(model 10) to -28.6 for the full model (model 1). 
From the table, important covariates can be 
identified. For illustration purposes, we 
consider models 1 and 8. The hypothesis we want 
to test is 


ORIGINAL PAGE IS 
OF POOR QUALITY 
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ORIGINAL ?AGS IS 


QE POOR QUALITY 

The likelihood ratio test yields 

X - -2 [-159.1+28.6] = 260 

and the 5% significance level of X^3) is 7,81. 

Hence Hq is rejected. To see if covariate 
contributes to the estimation, the hypothesis 
■ 0 is tested. To do this, we compare 
model 1 and 2 and obtain the X value of« 

-2[-30. 1+28.6] - 3. The 5% level for xfl) is 
3.8 and has to be accepted. In this way* it 
is readily seen that b^, b^* are highly 
significant . 

The goodness of the model is tested by 
applying it to the validation time series which 
is taken from an area to the south of the center 
of GATE (see section 3). Model 2 is adopted for 
validation, i.e. we use 

P(R> Imm/hr )- [ 1+exp -(-207 .8-61. 7F*307Fj+l . 34T) ]"^ 

The values of F, and T taken from the location 
designated for validation are substituted in (2) 
and the probability calculated. We defined as a 
goodness of fit criterion the mean square error 

MSE - 1/n I (P(Y.-l) - Y.))^ 



Fig. 3. Time series of predicted probability of 
exceeding 1 mm/hr and observed rainrate in 
another location. 


This is 0.005 for the prediction using model 2. 
It can be seen from fig. 3 that the prediction 
matches the observations very well. 

Table 2. Regression Coefficients for Different 
Logistic Models 


parameter 
regressi on 

F 

Fl 

T 

Tl 

maximized 

coeff. bo 
model 

bl 

•>2 

t>3 

b4 

log 

likelihood 


1 

- 217.7 

- 66.8 

309.8 

1.33 

0.077 - 28.6 


( 46 . 9 ) 

( 16 . 7 ) 

( 61 . 0 ) 

(. 30 ) 

(. 046 ) 

2 

- 207.8 

- 61.7 

307.0 

1.34 

1 

o 

ro 

1 

1 

t 

1 


( 44 . 8 ) 

( 16 . 0 ) 

( 59 . 4 ) 

(. 29 ) 


3 

- 17.7 

16.6 

249.7 

... 

0.071 - 62.8 


( 4 . 34 ) 

( 3 . 2 ) 

( 31 . 4 ) 


(. 028 ) 

4 

- 7.27 

22.5 

240.5 

... 

— - 65.9 


(. 68 ) 

( 2 . 4 ) 

( 29 . 3 ) 



5 

- 166.2 

- 62.8 

... 

1.10 

— - 76.2 


( 18 . 5 ) 

( 8 . 6 ) 


(. 12 ) 


6 

- 48.1 

__ _ 

... 

.33 

-.035 - 115.0 


( 3 . 46 ) 



(. 04 ) 

(. 027 ) 

7 

- 47.9 

... 

... 

.29 

— - 116.1 


( 3 . 4 ) 



(. 02 ) 


8 

- 4.9 

21.0 

... 

... 

— - 159.1 


(. 27 ) 

( 1 . 4 ) 




9 

- 34.9 


... 

... 

.21 - 188.1 


( 2 . 1 ) 




(. 01 ) 

10 

- 3.1 

• •• 

249.3 

... 

— - 309.6 


(. 12 ) 


( 19 . 5 ) 




The s.d.s of the coefficients appear in parentheses 


5. DISCUSSION 

In our example, the microwave temperature 
data, which integrates the effect of partially 
filled non-uniform field of view of the micro- 
wave sensor, are not independent since they are 
derived from a rainrate-temperature relationship 
but in the presence of T, the addition of T^ as 
a regressor does not improve the model signifi- 
cantly. This can be seen by comparing the 
maximized likelihood in models 1 and 2 and 
models 6 and 7. The interpretation is that most 
of the information is contained in T and the 
addition of T^ adds redundant information. In 
the absence ot T, the inclusion of T, improves 
the model as the maximized log likelihood is 
increased from -65.9 (model 4) to -62.8 (model 
3). This increase is significant by the log 
likelihood ratio test. 

The relationship between average rainrate 
and fractional rain area has strong implications 
for rainfall estimation. From infrared imagery, 
Arkin (1979) accounted for a large fraction of 
the rainfall variance by considering the number 
of pixels below a threshold temperature. 

Implict is the assumption that convective rain- 
fall, which is in the heavy rainrate portion of 
the rain spectrum, is produced in deep cumulus 
clouds, Kow important the heavy rainrates are in 
determining the total rainfall can be examined 
by considering F^. If we compare the maximized 
log likelihood in model 8 and 10, we see that 
model 8, with fractional low rainrates 
(>1 mm/hr) area as the only regressor is better 
than model 10 which uses only F^ in the maximum 
log likelihood sense. If both are used, a 
significantly better model is obtained (model 4). 

In this report* the potential of logistic 
models in rainfall estimation is demonstrated. 

We plan to extend our analysis using actual 
satellite observations such as the ESMR 
measurements taken on board the NIMBUS V 
satellite . 
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Abstract 

A technique to determine the time mean areal averaged rainfall 
is developed. The approach taken is to model the distribution of 
rainrate by a mixed distribution. The model is tested on rainfall data 
collected during GATE ( GARP -Global Atmospheric Research Program- Atlantic 
Tropical Experiment). Sampling designs which select only a portion 
of the rain data are used. It was found that a lognormal distribution 
provides an excellent fit to the rainy portion of the distribution. 

The results are insensitive for sampling frequencies in the range 
of half to a few hours in time and 16 to 40 kms in space. Sampling 
errors are about 10% of the mean or less for sampling designs which 
mimic observations by satellites that are polar orbiting or have a 
low inclination. An important parameter in the model is the probability 
of rain which correlates significantly with the average rainfall. 

This is consistent with earlier results such as those which relate 
the number of rain days and rain intensity to monthly rainfall and the 
use of the Area Time Integral (ATI) in estimating rain volume. 

The need for microwave sensors in satellite rainfall monitoring systems 
is stressed and an algorithm for estimating monthly mean rainfall from 
microwave sensor measurements such as the Electrically Scanning Microwave 
Radiometer (ESMR) or a radar is proposed. 
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1. Introduction 

The latent heat released during the process of precipitation consititues 
a major component in the forcing of atmospheric circulations (Lorenz 
1967). Theoretical as well as empirical studies have shown that variations 
in tropical forcing are instrumental to anomalous weather patterns world 
wide (Horel and Wallace 1981, Gill 1982). Accurate measurements of 
precipitation as an index of atmospheric variability are therefore useful 
both as a tool in diagnostic as well as prognostic studies of atmospheric 
ci rcul ati ons. 

Over land the problem of estimating time mean areal average rainfall 
has occupied hydrologists for a long time (Eagleson 1967, Rodriguez 
-Iturbe & Mejia 1974, Bras & Rodriguez-Iturbe 1976, Bras & Colon 1978). 

The interest is in river/ground water flow, flood forecasting and catchment 
hydrology. A major emphasis is the modeling of the rain field as a two 
dimensional random field. Once the parameters of the random field are 
estimated, the mean and variance of rainfall total can be calculated. 

The applicability of various mapping techniques to fill in missing data 
has been assessed by Creutin and Obled (1982) and approaches to network 
designs have been summarized by Moses (1982). 

Because of the huge extent of the tropical oceans and the errors 
associated with in situ measurements on board ships, satellite observation 
is probably the ultimate mode by which precipitation measurements can be 
made over the vast oceans (Austin and Geotis 1982; Atlas and Thiele 
1981). A review of various satellite rainfall estimation techniques is 
given by Barrett and Martin (1981). 
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The method of sampling by satellites differs from that by networks 
of land based rain guages. The former provides snap shots of precipitation 
information, in terms of radiances from different sensors, while the latter 
gives continuous rain gauge measurements at isolated stations. 

An alternative approach to modeling the temporal and spatial structure 
of the rain field is to consider the distribution of rainfall categories 
in the estimation of time areal mean rainfall. If one considers continuous 
sampling at a fixed location, it is obvious that the rain volume can be 
estimated either through integrating the time series of rainfall rate or 
via computing the mean of the rainrate distribution. Once the distribution 
of rainfall rates is obtained, the mean and variance of the total rainfall 
can be estimated. 

The climatology of heavy rainfall statistics at points or rainfall 
statistics along lines has been studied because of their importance in 
microwave communication (Rogers 1976, Drufuca & Rogers 1978, Lin 1976, 

Freeny & Gabbe 1969). The climatology of rainfall statistics for the whole 
rain spectrum has also been compiled for climatic studies. A common 
feature of these cummulative distributions of rainfall is that their 
functional forms are quite similar for a diversity of geographic regimes 
(Jones and Sims 1978). Oftentimes, a lognormal distribution is quoted. 

The estimation of time mean areal average rainfall is determined 
by two factors: how often does it rain and how hard does it rain when it 
rains? An approach that address the first question is the use of the so 
called “Area Time Integral" (ATI) in estimating rain volume (Lopez 1976, 
Donuead 1982a). The ATI is the integral over time of the area of 

precipitation as seen by radar. The use of a convective index (Arkin 
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1979) and the delineation of rain area from visible and infrared satellite 
imageries (Lovejoy & Austin 1982) seem to fall into this category. 

Jackson (1986) examines the two factors in toto by studying the relationship 
between the number of raindays in a month, the average rainfall intensity 
in raindays and the monthly total rainfall. 

In this report, we propose a mixed distribution model for the 
estimation of time mean areal average rainfall. The model is structured 
so that both factors can be combined in a single formulation. A mixed 
distribution is described (section 2) and applied to rainrate data 
collected during GATE. The GATE rainfall data and estimation procedures 
are described in sections 3 and 4. Section 5 presents our results 
for different sampling designs. The relative importance of the different 
contributing factors in the estimation scheme are examined in section 6. 
Section 7 discusses and concludes our findings. 


J 
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2. Mixed Distribution 

Most statistical distributions encountered in practice are 
either discrete or continuous. In the discrete case, the random variable 
assumes a finite (or countable) number of values while in the continuous 
case, the variables assumes all values in the interval which can be 
finite or infinite. However, there are situations when the random 
variable assumes distinct values with positive probability and other 
values in the continuous interval. Such a random variable is said to 
have a mixed distribution. An example of a mixed distribution comes 
from the reliability and life time testing of light bulbs. When a light 
bulb is turned on at time zero, there is a positive probability that it 
will be burnt out immediately. If the light bulb is not burnt out it 
is left on for an hour. The probability that the light bulb may be 
burnt out during the hour is positive. Hence the distribution of X has 
a jump at X=0 while in the interval (0,1], it is continuously 
differentiable (see Hogg and Tanis, 1977). The mixed distribution can 
be considered a special case of a mixture distribution. 

In the case of rainfall rate sampling, the probability of measuring 
no rain at any instance is large. Many previous studies have focused on 
the estimation of the raining portion of the distribution. It turns out, 
as we shall demonstrate in this paper, that the no rain probability is an 
important parameter in the estimation. 

The mixed distribution model of rainfall rates can be described as 
follows: Let R be the rainfall rate sampled in space and time. The 

cummulative probability distribution (CPO) can be written as 

F ( R ) = P ( R < r ) 

where P ( R < r ) is the probability that the rainfall rate R is less than 
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some fixed r. Let P(R=0) = 1 - p. The conditional density 
of R gi ven R > 0 is 

f ( r ) =l/p 3F/ 3 r 


It follows that the generalized density g(r) takes the form 

0 r < 0 

g(r) = 1 - p r = 0 (1) 

p f(r) r > 0 

where f is the density of R conditional on R>0. Thus the CPF can be written 
as 

r 

F(r) = (1-p) + p J f(x) dx, r > 0 

0 

The expected mean of R is 

oo 

E(R) = p i X f(x) dx 
0 

and the variance 


Var(R) = p { j x2 f(x) dx - p [ j x f(x) dx]2 } 

0 0 

The above mixed distribution can be described by several 
parameters, p and £ , ( 0^ )"( ®1» 02»***) 

f(r) = f(r,p, ) 

For a sample size of n which consists of m raining measurements and 
n-m non-raining measurements, the likelihood function of p and ^ is 
given by 

n-m m 

L(p, ± ...) = (1-p) p f(ri, I ),...f(rm, i ) 

The parameters can be estimated by various techniques such as the method of 
moments or maximum likelihood. 
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The maximum likelihood estimate of p is 

p = m/n 

which is independent of any distribution model (i.e. f). 
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3. The data and sampling design 

This technique has been tested by applying it to rainfall rate data 
collected during the GATE (GARP -Global Atmospheric Research Program- 
Atlantic Tropical Experiment). The GATE is an observational program 
conducted in the summer of 1974. During three roughly tri-weekly periods, 
each termed a phase, detailed rainfall measurements from rain gauges and 
radars on an array of research vessels were made over an area called the 
B-scale. The center of the B-scale area is located at 8.5N , 23. 5E and 
encompasses an area of about 200 km in diameter. Arkell and Hudlow 
(1977) composited the radar measurements from ships and presented an atlas of 
the radar echoes at 15 minute intervals. Patterson ^ (1979) 

converted the radar measurements to rainrates and presented rainrate 
data i n 4 by 4 km^ bins. 

To examine the spatial and temporal structure of the rain field 
various sampling designs have been used for the sampling. A design is 
described by 3 indices (n,k,l). The first index (n) denotes sampling 
frequency in time and the latter two (k,l) sampling frequencies in the east 
-west (x) and north-south (y) direction in space respectively. For 
example, the design (1,10,10) denotes sampling continuously in time 
(i.e. all 15 minute scans) but sampling spatially only every tenth 
pixel (40 km apart) in the x and y direction. This mimics the 
sampling by a raingauge network that continuously measures the rainrate 
with gauges placed 40 kms apart. The design (48,1,1) samples all pixels 
at an instance, but the time observations are taken only every 12 hrs (48 x 
15 minutes). This mimics the sampling by a densely scanning sensor on 
board a polar orbiting satellite which passes the same location twice 
per day at the same local times (e.g., 12 a.m. and 12 p.m.). 
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4. Estimation procedure 

Once the rainrate data are sampled, the parameters of the mixed 
distribution have to be estimated. The lognormal distribution has been 
adopted for the raining portion of the mixed distribution here. Much 
research effort has been devoted to modeling the rainrate distribution. 

Lognormality follows from the law of proportionate effects (Ailsnnson and 

► 

Brown 1963) and physical cloud models have been proposed which can 
explain the lognormal distribution of cloud sizes (Lopez 1977). 

Studies using the GATE radar data have shown that rainrates, 

size of radar echoes and their durations follow lognormal distributions 

(Houze and Cheng 1977, Houze and Betts 1981). 

4.1 Lognormal distribution 

The lognormal distribution can be written as: 

f(r)= l/(r a /2 It ) exp [ -(log r - u )^ /2 ] , r>0 (2) 

The mean a and variance of the lognormal distribution are 

a = exp ( y + 0 ^/ 2 ) 
g2= exp (2 li + a^) [exp( o^) - 1] 

(see Johnson and Kotz, p. 115, 1970). Consequently, The mean and variance 
of the complete mixed distribution is given by 

E(R) = p exp ( y + a^/2) 

Var(R) = p exp (2 y + o^) [exp ( o^) - p] 

= p a2 [exp( o2) - p] 

A thorough discussion of the lognormal distribution is given by Aitchison 
and Brown (1963). 
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4.2 Minimum Estimation 

We have grouped the GATE rainrate data into different rainfall 
rate categories. The categories are 0-1, 1-2, 2-4, 4-6, 6-8, 8-10, 10-12, 
12-16, 16-20, and >20 mm/hr. The first category was chosen because it 
is difficult to distinguish non-raining pixels and pixels with only a trace 
of rain which may be due to noise in radar reflectivity. This low cutoff 
at 1 mm/hr has been used in earlier studies (Austin & Geotis 1979). 

Because of this truncation, the estimates are slightly lower (about 2% 
of the estimated mean) than the means calculated directly even after 
adjustments have been made. For a lognormal distribution with typical 
parameters found in our study, the interval (0,1) contains about 10 to 
15% of the rainy pixels. 

Minimum chi square estimation is used in our procedure. This 
procedure is asymptotically equivalent to the maximum likelihood method 
obtained from (1) (Berksdn 1980). The variate can be written as: 

^ i=l (Oj - e^*)^/e,* (3) 

where o-|'s are the number of raining pixels observed in the i th category and 
e-f are the corresponding frequencies from a lognormal distribution with 
parameters y and o . 

The truncated distribution Rj for R>1 mm/hr can be written as: 

CO 

Rt “ i dr 

1 


a ® 

P(R-|- < a) = [ / f(r) dr] / [ / f(r) dr] 

1 1 


and so 
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If the number of rainy pixels greater than 1 mm/hr is N, then 

ei = N [ 4> ((log 2-w)/a)-4'(-w/a)]/ [1- ^ (- w / o )] 

where 4> is the distribution function of the standard normal distribution. 
Similar expressions can be obtained for the other e^'s. 

The X ^ estimation procedure can also shed some light on the complex 
structure of rainfall. The minimum x ^ value can be inflated or deflated 
for statistically dependent data even though the fit to the distribution 
is still good (see appendix). The dependence of the observations (o^j's) 
is introduced in the sampling process. For the (48,1,1) design, too 
much spatial dependence may be introduced while for the (1,10,10) design, 
too much temporal dependence may be introduced. 


4.3 Standard Error 

The expected mean and variance of the mixed lognormal distribution 
are given in subsection 4.1. Rewriting the expected mean as 

E(R) = p a (4) 

where a is the mean over the lognormal distribution (conditional on rain) 
and a = o ( ^ ). In this case ^ =( u » a ). Since p and ^ are 
asymptotically independent, i.e., p and e become statistically independent 
if the number of observations and the number of rainfall catagories goes to 
infinity, the variance of E(R) can be expressed as a sum of the variance 
of p and a . Consider the functional 

h(p, a )= p a 
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If p and a are independent an expansion in the form of a Taylor series gives 
h(p, a ) = p a + (p-p) ah/ 3p + ( a - a ) 3h/ 3a + ... 

= p a + (p-p) a +( 'a - a )p 

and so 

Var('p a ) = a^var(p) + PQvar( o' ) 

If we consider the rain/no rain sequence as the outcome of a Bernoulli trail 
with success rate p, the variance of p can be estimated as 

var($) i f(l-p)/m 

The variance of a is (Aitchison and Brown, p46, 1963) 

Var( a ) i ^ 2/n, ( S2 + '^4/2) 

Hence an approximate expression for the variance of E(R) is 

Var(ETR)) ^[2/^ ( $2 + ^4/2)+ ?,2'p(l-p)/m (5) 

Although the assumption about the independence of p and o is not 
strictly valid, this expression provides an estimate of the standard 
error which is a good approximation to sampling errors obtianed from 
ensembles of different sampling designs. 


If 
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5. Results 

The technique outlined in section 4 is applied to GATE data. 

Table 1 summarizes the results for the sampling design (8,8,8) for 
GATE 1. The value is 6.74. For 6 degrees of freedom, the 
X^6 95% is 12. Hence the hypothesis that the observed histogram 

can be fitted by a lognormal distribution cannot be rejected at the 95% 

level. With u =1.14 and 0^=1. 05, the mean and variance of the 

lognormal distribution is 5.28 and 51.5 respectively. Fig. 1 shows the 
observed histogram for the design (8,8,8) and a fit to a lognormal 
distribution. 

The GATE data have been sampled by various designs, with sampling 
frequencies of 1 to a few hours in time and 8 to 40 km in space. The 

results for GATE 1 and 2 are summarized in table 2. Within this 

frequency range of sampling in space and time, the x^ values are 
small and the lognormal distribution provides a good fit to the data. 

It is noted that these are sample estimates since each histogram is but 
one realization of a sampling design. 

5.1 Sensitivity to saturation at high rainrates 

A major problem assoicated with passive microwave sensors is the 
saturation at high rainrates. A test was conducted using the sample obtained 
from the (8,8,8) design, but with only 8 categories instead of 9. The 
two heavy rainrate categories are combined and the x^ statistics 
computed. The results for the two run are very similar, the estimated 
mean rainrates are within 5% of each other. Since only 8 categories are 
used, the degrees of freedom are reduced and the 95% confidence level is 
accordingly higher. This sensitivity test serves to illustrate the 
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possibility of applying this method to estimate rainfall from existing 
microwave measurements such as by the Electrically Scanning 
Microwave Radiometer flown on board NIMBUS V since this technique is 
not sensitive to the problem of saturation at the high rainrates. 

5.2 Comparison with Gamma Distributions 

In this subsection, we compare the statistics between a 
lognormal and Gamma distribution. The Gamma distribution can be written 
as 

f(r) =X“/r(a)r“"^ exp (-r X ), r>0, a , X >0 
where r( a ) is the Gamma function. A procedure as outlined in section 
4 was carried out and the results for lognormal and Gamma distribution 
for some selected designs are given in table 3. The lognormal 
distribution consistently gives a better fit to the observed histogram 
in the minimum sense. However as we shall demonstrate later, the 
exact choice of the rainrate distribution is not crucial in the estimation 
scheme. 

5.3 Satellite sampling 

To mimic the satellite sampling of rainfall by a polar orbiting 
satellite, the design (48,1,1) is applied to GATE. This is equivalent 
to sampling at roughly 12 hour intervals. Within the GATE period, there 
are periods when observations are missing. This design samples every 48 
snap shots in the sequence, paying no attention to missing periods. 

Hence not all samples are at intervals of 12 hours. As a comparison, 
sample estimates from the designs (24,1,1), (72,1,1) and (96,1,1), i.e., 
sampling at intervals of 6, 18 and 24 hours are made for GATE 1 and 2. 
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The results are summarized in table 4. Although the values are 
large, probably due to over-sampling in space and inadequate sampling 
in time, the estimated rainrates are quite close to the actual mean 
values of 0.45 and 0.37 for GATE 1 and GATE 2. 

Since the design (48,1,1) samples every 48th snap shot, 

48 distinct estimates from this design can be realized; i.e., the 
first estimate is derived form sampling the 1st, 49th, 97th, ..., etc 
snap shots, the second from the 2nd, 50th, 98th,. ...and so on to the 
48th estimate. The estimated means from these sample designs form a 
sample distribution. The histogram of these estimated means are shown 
in fig 2 (left column). The means and standard deviations of these 
distributions are computed and indicated in the figures. It should 
be noted that the menbers of the sampling emsemble are not independent. 

If the local diurnal cycle can be described entirely in terms of the 
first harmonic, sampling twice a day at 12 hours intervals is sufficient 
to specify the diurnal cycle. However any higher harmonics would 
introduced a bias. It is therefore of interest to examine the sampling 
errors associated with sampling frequencies slightly less than 12 hours 
so that the diurnal cycle is sampled through the course of about a month. 

A unique feature associated with the proposed Tropical Rainfall 
Measuring Mission (Theon ^ 1986) is a revisiting time of the 

satellite every roughly 10 hours, giving a total of about 80 partial visits 
(30 complete views) of a 600 by 600 km^ grid box. We mimic this strategy 
by the sampling design of (40,1,1). This design will actually give more 
than 30 complete visits per month. The important point here is that 
it will sample through the diurnal cycle. The histograms for the 40 
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estimated means are given in fig. 2 (middle column). There is a reduction 
in the standard deviation of the estimated means of the (40,1,1) design 
compared to the (48,1,1) design even the number of estimated means is less 
in the former case. 


5.4 Network Sampling 

To mimic the sampling by a network of gauges, the rainrates in GATE 1 
and 2 are samplied by the (1,10,10) design. Similar to the procedure 
described in section 5.2, 100 samples are obtained from the (1,10,10) design. 
The 100 different samples are obtained by sampling which starts at different 
locations in space. From the 100 sample estimates of the rainfall 
rates, the sample means for GATE! and 2 are 0.446 and 0.367 mm/hr and 
the s.d.s are 2.6% and 2.2% of the means respectively. The normality 
of the estimated rainfall rates are tested by using a minimum 
test similar to that described in section 4.2. The estimated rainfall 
rates are divided into 10 equal interval categories and the values 
are computed to be 4.9 and 7.2 respectively for GATE 1 and 2 compared to 
X^, .95=14. The hypotheses of normality therefore must be 
accepted at the 95% level. 
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6. Correlations and analyses of variance 

In estimating the standard error, the independence of p and a is 
assumed. This assumption can be examined by considering the 
correlation between the mean rainrate conditional on rain ( a ) and p 
for each of the 15 minute observations, p is calculated as the percent 
of pixels with rain rate in excess of 1 mm/hr to the total number of pixels, 
o is calculated as R/p and the condition of p=0 is not considerd in 
the calculation. 

The linear correlation between c and p is 0.58 (0.52) whereas 
that between p and R is 0.94 (0.94) for GATE 1 (2). Similar relations 
are also found in the GATE 3 data (Lovejoy 1982). The correlation 
coefficients between the logarithm of the quantities are higher. The 
results are summarized in fig. 3 which shows the scatter diagrams between 
the three quantities. The correlation coefficient between log p and 
log R is 0.99 for both GATE 1 and 2 whereas that between log a and log 
p is 0.63 and 0.51 respectively for GATE 1 and 2. 

The histograms of R, p and ei- are given in fig. 4. The 
distribution of p is skewed. Since the value of p lies between 
0 and 1, a fit to a beta distribution may be appropriate. There is zero 
probability that the whole of GATE area is totally covered with rain. 
Obviously, the parameters of the distribution are dependent on the size 
of the area. Chiu and Kedem (1986) examined the fractional area for an area 
of about 40 by 40 km^ using the same GATE data. In this instance, there 
are times when the smaller area (40 by 40 km^) are fully covered. 

There are times when the GATE area is totally rain free for 
a cutoff of 1 mm/hr. If a lower cutoff is used, e.g. 0 mm/hr, the 
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fractional rain free time is accordingly reduced. 

We also examined the contribution of variance of p and d to 
that of R. If we take the logarithm of the equation 

"R (mm/r)= p a (mm/hr) 

we get 

log R = log p + log « 

the variance of which is 

var(log = var (log p) + var (log a ) + 2 cov (log p log « ) 

(77%) (2%) (21%) GATE 1 

(78%) (2%) (20%) GATE 2 

The contributions by each term are given in parenthesis for GATE 1 and 2, 

It can be seen that the variance of log p dominates the variance in logT. 
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7. Discussions and Conclusions 

It is demonstrated that the mixed distribution model provides a 
good estimate of time mean areal average rainfall at least for GATE type 
situations. The advantage of this model is its simplicity. Once the 
rainrates are sampled, the parameters can readily be estimated. 

The mixed distribution approach suggests a retrieval algorithm 
for the estimation of monthly rainfall from satellites. If a functional 
relation exists between rainfall rates and radiance measurements, such as 
that proposed by Wilheit (1977 ), one would then accumulate the 

radiance measurements and compute histograms of radiance for the month. 

The histogram in radiance is then transformed into rainfall rates by the 
radi ance-rainfal 1 rate relation. The parameters of the lognormal distribution 
of the resultant rainfall rate histogram is then estimated to get the 
mean and variance. Consideration must be given to other factors such as 
beam filling and the variation of pixel size as a function of beam position. 

To mimic satellite and rain gauge network sampling, various sampling 
designs have been devised. The sampling errors are about 10% for sampling 
by a polar orbiting satellite ((48,1,1) design). The sampling errors 
are reduced to about 5% for a satellite observation at low inclination 
((40,1,1) design). McConnell & North (1987, this issue) examine sampling 
errors for four rainrate categories which contribute about equally to the 
total rainfall for sampling every 65t) minutes of the same data. They 
found that the sampling errors in each of the rainrate categories are about 
10%. If the categories are independent, the error in the total is reduced 
by / 4, which is consistent with the 5% error found in this study. 
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We found a very strong correlation between the average rainfall 
rate and the fractional rain area in the GATE area. Chiu & Kedem (1986) 
had examined the usefulness of the fractional area with rainrates in excess 
of 20 mm/hr to estimate total rain volume. The high cutoff mimics the cloud 
index of Arkin (1979) to delineate fractional high cloud area. They 
found that the fractional light rain (rainrates greater than 1 mm/hr) 
area gives a better model than that whicc uses the fractional heavy rain area 
(rainrates greater than 20 mm/hr) alone. But when the two variables are 
used together, a much better model is obtained. 

Jackson (1986) found that the monthly total rainfall in some tropical 
stations is strongly related to the number of raindays but bears little 
little relation to the average daily intensity. A fair amount of skill 
has been achieved in the prediction of rain amount by the rain area as 
depicted in satellite visible and infrared imageries (Lovejoy & Austin 
1979). Radar meteorologists have also found that the so called "Area 
Time Integral" (ATI) is a useful indicator of rain volume (Lopez 1982, 

Doneaud ^ 1982a). Doneaud et al (1982b) have applied the idea to 

rain gauge measurements. They also found that the percent of time when 
it rains is significantly related to the total rainfall. These are 
consistent with our findings of the importance of the parameter 
p. If we consider a design which samples all pixels in time and space, 
the estimated p is equivalent to the ATI. The improvement over the ATI 
technique by the mixed distribution would be derived from a knowledge of 
the distribution of the rainrates a conditional on rain. It provides 
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an estimate of the average rainrate intensity which replaces the 
climatological average often used in rain total estimates. 

The importance of the fractional rain area in rainfall estimation 
has strong implications on satellite rainfall monitoring. Because of the 
absorption properties of raindrops, microwave sensors can clearly 
distinguish between rainy and non rainy areas. This special feature 
points to the need of microwave sensors, either active or passive, in the 
remote sensing of rain. These measurements, when used in conjunction 
with measurements from geostationary satellites such as GOES, can provide 
accurate monthly mean rainfall measurements. 

Perhaps the most important conclusion that we can draw from this 
work is that, to the extent that the GATE data are representati ve of 
oceanic rainfall in the tropics, revisiting an area of roughly the GATE 
dimension (350 by 350 km^) at a repetition rate of about once every 10 
to 12 hours provides an excellent estimate (of the order of 5 to 10% sampling 
error) for the area average three week mean rainrate for the region. 

This is within the capability of a single space platform with scanning 
sensors in a low inclination (tropical) orbit. This result is in good 
agreement with the work of Laughlin (1981) who used a rather different 
(Markov process) approach but based also upon the same GATE data. 
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Appendi x: Remarks on the use of ^ for dependent data 

There is ample evidence that the rainrate is lognormally 
distributed as illustrated by the small values in 3^d the excellent 
fit. When the value is large (even though the estimated parameters 
obtained from the minimum chi square estiamtion are very similar) it is 
usually associated with sampling designs that sample the rainrate at 
points in time or space that are close to each other. This may not mean 
that the fit to the lognormal distribution is not good, but may suggest 
dependence in the sample. This can be understood as follows. 

Let Pt=Ot/N, with i = l 9, and let pi= E(p^')=6t/N, where E(x) is 

the expected value of x. Define the vectors £=(pi ,. . . ,P8) ' » £=(Pl »♦ ♦ • »Ps) ' 
and j^=(l , . . . ,1) ' and put 

A= diag(l/pi,... ,1/P8) +11' l/pg 

then we have 

X^= E ^4(°i"®i)^/®i = N (£ - £)'A (£ - £) (*) 

Assuming that the rainrates satifies some dependence condition (e.g. 
fi nite-dependenceas discussed by Anderson 1971, p427) so that for large 
N, /N(p “'p) converges to a normal distribution, 

. L 

✓N (p - p) > N(0,V) 

then for sufficiently large N 

~ y • § X'Z? 

where the are independent x^(i) variables. If the sampled rainrates 
are independent, X^=l for all i and x^ is distributed as a chi- 
square variable with 8 degrees of freedom. But if the sampled rainrates 
are dependent, X^ 1 and {*) can be inflated or deflated since its 
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asymptotic expected value is . The practical outcome emerging 

from this discussion is that large values of (*) may indicate dependence 
despite a possible perfect fit. As the rainrates are sampled further 
apart in time and space, they become reasonably independent and the 
distribution of (*) is close to a chi square distribution with 8 degrees 
of freedom adjusted for the number of unknown parameters. See also 
Kedem and Slud (1981) who discuss a similar quadratic form whose values 
are inflated due to dependence of the data. 
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Table 1. 

Results for 

(8,8,8) sampling 

cl ass 

Oi 

ei 

1-2 

453 

450 

2-4 

590 

598 

4-6 

325 

324 

6-8 

207 

188 

8-10 

116 

116 

10-12 

60 

76 

12-16 

82 

88 

16-20 

52 

46 

>20 

80 

79 

total 

1965 

1965 x^=6.7^ 

M = 1.14 

, a2=1.047 



a =5.28, 


e2=51.5 
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Table 2: 

Estimated means, 
desi gns. 

minimum and 

fraction of 

rain for different 

GATE 1 

n, (k,l) 

(4.4) 

(6.6) 

(8,8) 

(10.10) 

2 

.44 

.44 

.45 

.46 


(11.1) 8.3 

(15.3) 8.2 

(4.2) 8.2 

(2.4) 8.7 

4 

.44 

.44 

.45 

.46 


(3.7) 8.3 

(13.9) 8.1 

(3.8) 8.4 

(9.4) 8.4 

6 

.45 

.43 

.45 

.46 


(7.9) 8.3 

(2.9) 8.1 

(3.8) 8.2 

(2.1) 8.9 

8 

.44 

.44 

.44 

.44 


(2.7) 8.3 

(4.9) 8.0 

(6.7) 8.3 

(7.9) 8.3 

10 

.45 

.45 

.45 

.43 


(4.3) 8.3 

(4.5) 8.2 

(5.2) 8.3 

(3.4) 8.1 

GATE 2 

2 

.37 

.36 

.37 

.36 


(59.9) 6.8 

(19.8) 6.8 

(36.9) 6.9 

(10.3) 6.9 

4 

.37 

.36 

.36 

.35 


(50.4) 6.9 

(8.4) 6.9 

(23.8) 7.1 

(9.7) 7.0 

6 

.38 

.37 

.39 

.36 


(27.6) 7.0 

(12.1) 6.9 

(16.9) 7.1 

(18.1) 7.0 

8 

.36 

.34 

.35 

.37 


(23.8) 6.8 

(4.9) 6.9 

(17.4) 7.0 

(9.3) 7.2 

10 

.38 

.37 

.39 

.37 


(19.3) 7.1 

(8.5) 7.0 

(7.7) 7.2 

(5.4) 7.2 


Estimated rainrate in mm/hr on top line. Minimum chi square value in 
parentheses. The estimated rain probability, p, appeared in the lower 
right hand corner, in percent. 
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Table 3. Comparison between gamma and lognormal distribution fit to various 
desi gns 

lognormal Gamma 


desi gn 

* 

n 

u 

0 2 

x2 

a 

X 

x2 

(30,10,10) 

333 

1.00 

1.16 

6.04 

0.29 

0.12 

5.57 

(20,10,10) 

456 

1.10 

1.07 

7.76 

0.37 

0.13 

12.14 

(10,10,10) 

972 

1.09 

1.18 

3.39 

0.30 

0.10 

13.73 

( 5,10,10) 

1976 

1.06 

1.21 

6.80 

0.34 

0.10 

32.46 

(10, 5, 5) 

3936 

1.12 

1.12 

8.77 

0.35 

0.12 

44.24 

( 5, 5, 5) 

7889 

1.11 

1.13 

16.83 

0.35 

0.12 

85.87 

(10,20,20) 

219 

1.32 

1.00 

4.98 

0.49 

0.12 

12.39 

( 5,30,30) 

263 

1.09 

1.41 

6.53 

0.26 

0.09 

4.09 

( 5,20,20) 

461 

1.19 

1.07 

0.80 

0.41 

0.12 

7.21 


n* is the number of raining pixels 
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Table 4. Comparisons of Estimates from sampling designs for GATE 1 and 2. 


GATE I GATE II 


desi gn 

n* P 

<R> St. err 

X 2 

★ 

n p 

<R> 

St. err 

X 2 

(24,1,1) 

42237 .083 

.448 .0079 

27.4 

30111 .069 

.364 

.0083 

148.1 

(48,1,1) 

22976 .088 

.514 .0119 

48.1 

14156 .066 

.317 

.0107 

77.4 

(72,1,1) 

14533 .086 

.457 .0135 

20.9 

8826 .061 

.316 

.0141 

77.2 

(96,1,1) 

11622 .089 

.572 .0187 

22.5 

6409 .058 

.282 

.0151 

46.7 

n* number 

of raining 

pi xels 
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Figures 

1. Histogram of rainfall rate sampled from GATE by the design (8,8,8). 

The curve is a lognormal fit to the histogram with parameters 

\i =1.14 and o2=i^05 which are estimated by the method of minimum 
chi square. 

2. Histograms of the estimated means from sampling designs of (48,1,1), 
(40,1,1) and (1,10,10) (left, middle and right column respectively) for 
GATE 1 (upper) and GATE 2 (lower). The total number of samples 

are 48,40 and 100 for the three designs. The means and standard 
deviations are included in the upper right hand corners. 

3. Scatter diagram of the logarithm of the average rainfall rate (^) 
and fractioanl rain area in the GATE area for GATE 1 (upper) and 

GATE 2 (lower). A cutoff value of 1 mm/hr is used to distinguish between 
rainy and dry pixels. The correlation coefficients are indicated on 
the upper left hand corners. 

4. Scatter diagram of the logarithm of average intensity of the rainy pixels 
( a ) and fractional rain area (p) for GATE 1 (upper) and GATE 2 (lower). 

The correlation coefficients are indicated on the upper left hand corners. 

5. Histograms of the average rainfall rate (R^), fractional area (p) over the 
GATE area and average intensity of the rainy pixels ( a ) (left, 

middle and right columns) for GATE 1 (upper) and GATE 2 (lower). 

The means and standard deviations are indicated on the upper right hand 
corners. The numbers below the means and standard deviations on the 
histograms of R indicate there are 1622 (1419) observations out of 
1716 (1512) with p 0 in GATE 1 (2). 
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Attachment C: A Rainfield Simulation Model 


A. Simulati on of Rain Fie ld Snapshots 

In the absence of real rainrate data, it is useful to generate 
artificial data by stochastic models which preserve certain specified 
statistics. Also, such a model is very helpful in assessing the 
outcomes of controlled experiments. We have developed such a model 
and intend to use it in the next phase. A source FORTRAN program is 
attached . 

A.l A Stochastic Rain Field Model 

In what follows, we describe a simulation model which generates 
artificial "radar" snapshots of a rain field. 

Our model is made of three parts, one of which is fixed while the 
others move in relation to the fixed part. The three parts are (See 
figure A1 . ) : 

(a) Spatial random rainfield (moving); 

(b) Cloud field (fixed); and 

(c) Moving window (moving). 

This is a very flexible model which can accommodate any kind of cloud 
and rain fields. 


Eigure A1 



Moving Sampling Window 

Fixed Cloud Field 

Moving Random 
Lognormal Rainfield 


A. 2 Random Rain Field 

This field consists of a spatial moving average with specified 
distribution for its rainrate (in this case lognormal) and specified 
spatial correlation. This is the bottom part and should be thought of 
as an infinite random field which is being constantly shifted. For 
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example, we can use a field of the form 
R(i, j) = exp[y(i, j) + 1.140] 


where 


Y(i, j) = E(i, j) + 0.1084[E(i - 1, j) + E(i + 1, j) 

+E(i, j - 1) +E(i, j + 1)], i, j = 0, ±1, ±2, . . 

where E(i, j) is white Gaussian noise. In this case, R(i, j) has a 
lognormal distribution 

A()Jr, o2j^) 

with parameters 

hR = 1.14 

and 


= 1.047 

The coefficient 0.1084 is needed for stationarity requirements. We 
can easily change this model to suit any correlation requirement. 

A. 3 Cloud Field 

The cloud field covers a certain large area (e.g. , GATE area) and 
consists of clouds whose areas are very close to being lognormally 
distributed. It is a fixed field located above the rainfield. The 
"clouds" are to be thought of as "holes" or "windows" through which we 
see rain. At a given time constant, what we see through a given cloud 
is precisely its content. This content keeps changing since the 
rainfield is moving. 

Here is how a cloud is generated in a field of area 1 q 4 pixels. 
Consider, for example, an interval at length 100 from which a point is 
selected at random. From that point, we measure a random length whose 
distribution is lognormal with parameters p, o^. Let X be the of 

this length which overlays with the interval (0, 100). Then, by 
properly conditioning X, we have 

1 

E(X) *= [100 exp(y + - o2) - exp(2y + 2o^)]/200 

2 

The square of this quantity (by independence) can be thought of 
as the average size of a "random cloud." Let 

M * number of clouds 
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Then the fractional rainy area over a field of area 10^ is given by 
M X e2(x) 


10 -^ 

Thus, we can control the probability of rain by y, o2, and M. 
The table below illustrates this fact. 


y 

o2 

EX 

M 

Probability of Rain 

1 

1 

4.208 

50 

0.088 

1 

1 

4.208 

40 

0.071 

1 

in 

• 

o 

3.3899 

70 

0.08 

1 

2 

5.3719 

28 

0.08 


The probability of rain is fixed over the cloud field but can 
obviously change for subfields. 

The truncation at, say, 100 is needed for the rainy area under 
study in real life is usually an arbitrary area taken from a much 
larger area by truncation. 

A. 4 The Sampling Window 

The third part is a moving window which moves at random over the 
cloud field. Each time the context of the window is observed, we call 
it a snapshot. 

Figure A2 shows a typical snapshot with a sampling window of 20 x 
20 pixels. The zeroes denote the no rain areas, and the rainrate are 
given in mm/hr. 

A. 5 Comparison with Laughlin's Results 

To estimate the error in satellite sampling, Laughlin (1982) 
computed the temporal autocorrelation function as a function of 
average areas using the GATE data. Since our model parameters are 
constrained by the GATE observations, a calculation similar to 
Laughlin was carried out and the results presented on figure A3. The 
results are very similar to those of Laughlin, as anticipated. 
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Program listing to generate Rain fields from the Simulation Model 


C 

C 

C 

c 

c 

c 

c 

c 

r 

c 

c 

c 

c 

c 

c 

r 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


c 


c 


c 


THI<=; DPCGRA^■ CALCULATGS FIEL^JS FRCM E£A KEOEVi'S TIME- 

noEKT paint all FATE yTDFL AND WRITES TLEy TC TAPE FOP FURTHER 
ANALYSIS. 

INPUT : OSFFDU - APR AY CF SEEDS FCR THE RANCCV NUM2EP CFNEPATCR FCP 

rALCULA’^ING PAIN FIELDS 

OELTAT - TIWF in hours CP =ACH TIME STFF (USEFUL RANGE .2E 
TG 24 . ) 

NCAYS - NUMEEP CF DAYS pOE KUHICF PAIN FIELDS ARE TO 9E 
CALCLLATED 

NSTEPS - NUWSEP OF TIME STEPS p CP WHICH RAIN FIELDS APE 
TC EE CALCULATED 

NCALLS - NUMEEP OF CALL TC 3E MADE TC TEE RAINFALL CALCULATING 
FPC CF am 

N - NUVEER CF POINTS IN EACH OTKFNSICN CF TEE RAINFALL ARRAY 

RESIZE - SIZE OF THE SIDES OF PIXELS IN KM 

GPHASE - GATE PHASE OF THF SIMULATION INPUT PARAMETERS 

AVAFEA - AVERAGE AREA OF CCNTIGLOUS RAIN PATCHES 

AVFTR - AVERAGE FRACTION CF" A REALIZATION THAT HAS RAIN 

INTERNAL : I - LCCP ANC ARRAY INDEX 

SEED - AFFAY OF SEEDS FOR RANCCF NUMBER GENERATORS TO BE 
PASSED TO SUBROUTINES IN CROEF TO LEAVE THE DSEEDO 
ARRAYS UNCHANGED 

time - CU'i'ULATI VE TIME OF THE REALIZATION 

OUTPUT : RR - TWO-DIMENSIONAL ARRAY OF RAINFALL RATES 

SUBPCUTINES : EKRATN - CALCULATES RAINFALL RATE ARRAYS AND WRITES 

THEM TO TAPE 


REAL *4 RP { 1 26. 1 28 ) .RESI ZE »R I NC 

REAL *4 T IME .CELT AT . A VAR E A , AVFTR. GPHA SE 

REAL^e OSEECO ( 8 ) . SEED( 4 ) 

INTEGER *4 N .NST EP S . ND AYS. NC ALL S . IDAY 
I NTEGEF’«'4 I .11 . J.L 

DATA NOAYS/2/.N/1 S5/.NSTPPS/24/ 

OATA avPTR/ .09/.CELTAT/1 .00/ 

DATA AVAPEA/450./.RESI2E/4.0/.R1NC/5.0/ 

DATA DSEPD0/3141ES2. CO. 23141 59, DO. 9231415. CO. 5923141 .DO. 
C1S92314.D0, A159231.D0, 141 592 3, DO. 1341592. DC/ 

INITIALIZE VARIABLES 
I 0AY=0 
T TME=0 . 0 

ncalls=ntnt ( (float (NCAYS>/2 .0) +0 .4) 

WR T TF (6,9) NC all S 
9 FCRMAT(» •,*NCALLS= *, 13 ) 

DC 11 J= 1 ,4 
SFEO( J ) = CSEEDO( J ) 

11 CONTINUE 

LCCP over CALLS TC RAINFALL MODEL 
DC 20 11=1, NCALLS 

CALL PKFAIN(N,FES 1 7E . GPH A SE , DA Y , T I ME , AV ARE A , AVFT R , S EED, DEL T AT , 
CRR. ICAY . TCS ) 

IF (ICS .NE. 0) THFN 

wlCTTP fft-cc;) 

ORIGINAL PAGE IS 
OE POOR QUALITY 

END 

SU3R0UT INE EKRATN(NUM. RESIZE, GPH ASE , CAY. TIME. AVAREA, AVFTR, SEED. 


99 FCFMA7C ‘.'I/C ERROR') 

GO TC IOC 
END IF 

20 continue 

CALL L EVPTC (N .F IN C. RR ) 







COELTAT ,RR . I CAY, ICS ) 

SEN KEOEN*, UNlVERSIT'i CF MO 
AUG. 20, 1?F6 


ORIGINAL PAGE IS 
OS POOR QUALITY 
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THIS program 
PROPEPT lES • 


GENFRA7ES RAINFALL FIELDS kITH CATE~LIKE STATISTICAL 


DPAi I rNlC^OOI.LGNPlSOOI.UKSOO.USISOOI.YOOO , 30 0), FI 30 C , 3 00 I 

pisi 2E:rP(l2a,i:»).G'>HASF, DAY. time. AVARFA,AVFTP,OFLTAT 

RFftL+4 xM,SIGVA,SIG?C.U(90000) 

*I NTEC-rF*^‘^ I V ( 2 C07 20 0?!^^ II . i J . M , WM .VIM.MNJ.MAXI.MAXJ.II.JI 
INtIc-EF + 4 I 1 1. JJJ .NSNP.NOAYS.I CS.NUM ,IOAY 
CHARACTEP*7 FILELE(30) 


AND CD statements 
11 1 F04 • - * 


CHAPACTEP CCNSTANTS FOR FILE PAPAMETPP 

C^FTl 1F0«* /' FTI 1P07 • , *=11 lP0=i' . *FT1 IF C^* . 'FT I 1^10' . • FTI IFM 


■?S' ,*“T11F2 6‘, •FT11F2' 


• , • FT 1 1F23 • , 


• FT I 1 F07 * 

• FT 1 IF 1 S * 

C'FTllRlE'.'^Tl IFIC' 

C'FTl 1P2A* ,* FTllF 
r » FT 1 1F30 • / 

<;DFriFY FAPAVETERS and CS' s, ( osfeds ) 

^ DATA DSI/42S7/.DS2/F01 /,DS3/9074/,OSA/419/ 

DS1=SEF0 ( 1 ) 

0£2=SFEn( 2 ) 

DS3=£EEC ( 3) 

DS4= SEEC (4 ) 

XM=1 .0 

MMM=NI NT^ ( NLM*NUM*RE£IZE*RESIZE*AVFT R ) /AV ARE A) 
V'RITE (6,9) MMM 

9 FCRMAT(* CF RAIN PATCHES *,I5) 

S IGMA=SCRT ( £ IGSO ) 

N = 500 

DC 5 1= 1 , 20 C 
DC 6 J=l,200 
I V(I ,j)=0 
CCNT INUE 
CCNTINUE 

CALL GGues( csi,N,un 

GGUeS( CS2,N,L2 ) , 

GGNLG(CS3,N»>M»SIGMA,LGN1 ) 

GGNLG( CS4,N,XV,SIGMA,LGN2 ) 

CLC'JD FIELD CVER 50 EY 50 AREA 
DC 777 M=1,MMM 
MINI = INT (200=*U1 (M )) 

M INJ = INT ( 20 C*U2( V ) ) 

MAXI=MI M + I NT (l.GM (M ) ) 

MAXJ=M»NJ+I NT(LGN2( M)) 

IF (MINI .L F . 1 ) ''IN 1=1 

IF(M1NJ .LE. 1) V1NJ = I 
IF(MAXI .GE. 200) MAXI=200 
IF(MAXJ ,GE, 200) MAXJ=200 
DC 11 1= MINI, MAXI 

DC 12 J=MINj,yAXJ 


• FT 1 1 F2 9 


6 

5 


CALL 

CALL 

CALL 

GENERATE 


I V( I , J ) = 1 
CCNT INUE 
CCNT INUE 
CCNT INUE 

DI STCRT RECTANGULAR 
DC 51 J= 1 » 1 99 
DC 52 1=1,199 

I F ( i V( I » J ) .N5 . 

IF( IV( I , J) »N£ • 

C CNT I NUF 
CCNT INUE 

SPECIFY EACKGROLNC RAIN 
NNGP=9C OOC 

CALL GC\°M{ CST.NNCP.U) 
K=0 

DO 2 1 1= 1 ,2C0 

DC 22 J=1»3C0 
K = K+ 1 

E ( I , J ) = U ( K ) 

22 CCNTINUE 


I 2 
1 I 
777 


52 
5 1 


PAIN CLOUDS 


IV(I+1*J+D) TV(I,J)=TNT(Ulf I)+0»5) 
IV( I ,J + l > ) IVf I , J)=I NT( U?( J) +0. 5 ) 


o 



nnnr>nononnoonnni'>nn;">n 
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21 CONTINUE 


C 

c 

c 

c 

c 

c 


c 


DO 23 1 = 2,2<=<5 

DC 24 J=2.2^9 

XP=E (IfJ) + . lO04<f(E( I-l^Jl+E(T+l»J)+E<T.J-l) + E(I 
V ( I • J) =EXP ( XP+1 .140) 

24 CONTINUE 
23 CONTINUE 


J+1 ) ) 


GET SNAPSHOTS BV MCVT^G THE 128 BY 128 WINDOW CF CLOUDS OVER Tl-^ 
MOVING RAINFIELC, THE ADVECTION IS ACCOMPLISHED BY INCREMENTING II 
AND JJ. the rain f^ATE AT THE BOUNDARIES OF THE CLOUDS CHANG«=^S VIA 
THIS JOT NT CYNAfcric. VRITE EACH S»^IA-5SHQT tq TAPE AFTER IT IS 
GENERATED. 

I r-0 
J j = o 
1 1=1 
ji = i 

DC 65 NDAYS=lf2 
ICAY=ICAV+1 

OPFN(UMT=ri 1.ERR = 200.STATUS=*NEW*.FILE=FILELE( ID AY). 

CA CC ESS SEQUENT 1 aL • .ECP M= »UNF0RMATTF C».inSTAT=inS) 

DC 65 NSNP=1.24 
11=11+1 
T 11=11+127 
J J= J J+ 1 
J JJ= JJ+ 127 
DC 70 1=1.1 28 

DC 71 J= 1 . 1 2€ 

PP(I.J) = Y(IT+I + I1 .JJ + J + Jl )^IV( II + I.JJ + J) 

71 CONTINUE 
TO CONTINUE 
11=11+1 


J 1 = J 1+ 1 

TIME=TIME+DELTAT 

WRITE ( UNIT = ll ) GFHASE.DA Y.TIME .RESI 2E ^ AV A RE A . A VFTR . XM . S I G V A 
C((RP(I.J). 1=1. 12E). J=l. 123) 

66 CONTINUE 

CLOSE! UNIT= 1 1 .ERR = 3 00. STA TUS= • KEEP • . IOSTAT=ICS) 

65 CONTINUE 
100 SEED ( 1 ) = 0S1 
SEED (2 ) =DS2 
SEED (3 ) =DS3 
SEED (4 )=DS4 

200 IF ( lOS .NE . 0) ThEN 
WRITE (6.2R9) 

29R FORMAT!* •♦•I/C ERROR CN OPEN*) 

GO TO 40C 


END I^^ 

300 IF (ICS .NE. 0) THEN 
WRITE (e f 39 9) 

399 rCRVAlC* *.*I/C FPROP CN CLOSE*) 

GC TO 40C 
END IF 


ORlGl>i'AL PAGL la 

QE £00R QUALITY 


400 RPTLTN 


ENO 


SUE>RCUTUE LEVPTC (N.PINC, c) 


P. L. VAWTIK ACC e CS = C 
PCT. 21 . 1 PEE 


this S'JEPCLTINE takes a 2 DIVFNSICNAL apkay cf values, up to 12 s BY 
123 Af^n PISS THEM IS IG EQUALLY SPACED FIS'S PINC IN SIZE, PLUS A 
LEVEL. VALUES CUTSICE THE APE SET »=0UAL TO 0 OP 10. T t- E LEVEL‘S 

APE THEN PPINTEO OUT AS CHARACTEPS TO REPRESENT THE CONTOUR LEVELS OF 
Tt-E ARPAV. 


tnput : - SIZE OF the square array. R. 

RISC - SIZE CF FINS TO PLACE THE DATA is 
P - 2 CIMFNSTCSAL array of DATA VALUES 

INTERNAL : I - LOCP INDEX 

J - LGCP index 

LP - ARRAY CF INTESFR VALUES TO REPRESENT THE BINNED DATA 
P - ARRAY OF CHARACTERS TO RF'^RESEST THE PINNED DATA 

RETURNS : SCTHISG 


3 



REAL*A P ( 12P , 1 : * ) INC 
INT^GFC*4 M ,L°( 1 25) , I ,J 
CHARACTER*! P<12S) 

WR I T E { 6 , 1 ) 

1 FORMAT (• 1 •» 'PEL AT IVF PA infall PATE*) 

OC 5 I = 1 ,N 
R ( I ) = • . • 

5 CONTINUE 

*(0 ITE( « 4 1 ) (P ( J ) , J=1,N) 

DC 10 1 = 1. N 

OC 2 0 J=1 .N 
LP< J )=0 
20 CONTINUE 

00 3C J=1.N 

IF {P(!,J) ,GT. C .0 ) LR( J )=INT ( (P( I , Jl/PINC) +1 .0 ) 

IF (LP(J) .GT. 1C) LP(J)=10 
70 CCNT INUE 

00 AO J= I .N 

GO TC ( 101 , 102 . 1 02 . 104, 1 05, 1 06 , I 07, 1 CS . 109 . 1 10 , 1 1 1 ) LP(J) + 1 

101 R ( J) =' • 

GO TC AC 

102 P(J) = * : ' 

GO TC AO 

103 R ( J ) = •- • 

GC TC AO 

104 P(J)='+* 

GC TC 40 

105 R ( J ) =• / • 

GO TC 40 

106 P{J)='H* 

GC TC 40 

107 P ( J) =• X ' 

GC TC 40 

103 P { J ) =• « • 

GC TC 40 
109 P ( J > = •>* • 

GC TC 40 
no P(J )=• a* 

GC TC 40 
111 P ( J ) = • ♦ • 

40 continue 

’a-fTTE(6 ,41) (PC J) , J=1,N) 

41 FORMAT!* *,*,*,12E A,*.*) 

10 CONTINUE 

DC 5 0 I = 1 , N 
P ( I ) = * . * 

50 CONTINUE 

WPTTE(6,41)(P(J), J=l,N) 

RETURN 

END 


j.-Jy -l.'J, 
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ABSTRACT 

A stochastic regression model is used in modeling rainrate. 
Under some conditions on the model parameters, it is shown that 
rainrate is asymptotically lognormal. An application of the model 
to the GATE data shows a remarkable agreement between the assumed 
and estimated model parameters for rainrate averaged over suffi- 
ciently large area and a sampling interval of 15 minutes. 

1 . INTRODUCTION 

There is ample evidence based on observations that rain 
characteristics tend to be approximately lognormally distributed. 
This observation is shared by quite a few research workers who 
considered different data sets. These pertain to the duration of 
rainfall and amount, and to horizontal and vertical cloud extent 
in tropical and extratropical regions under a wide variety of 
convective conditions (Biondini 1976, Lopez 1977, Houze and Cheng 
1977, Chiu et al . 1986). The question is then what makes the 
lognormal distribution so prevalent when it comes to rain systems 
and whether there is any theoretical basis for these observational 
findings. On practical grounds, we may ask whether at all it 
even makes sense to fit a lognormal distribution to rain 
characteristics and under what conditions. This is the subject of 
the present note. We will focus on the lognormality of rainrate. 

Many authors believe that the lognormal distribution is a 
natural outcome of the so called law of proportionate effect 
(Aitchison and Brown 1963, p. 22). Accordingly, {Xj) satis- 
fies the law of proportionate effect If 

= ^j*j-l 

where the ^.j's are mutually independent and are also independent 
of the X.'s. While the law of proportionate effect is of funda- 

ti^ 
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mental importance in motivating the lognormal distribution, the 
independence assumption on the is quite restrictive and can 

in fact be relaxed. It is sufficient that the obey con- 

ditions which guarantee the asymptotic normality of sums in terms 
of these variates. For this to hold, they need not be independent 
and may even be dependent on the Xj's. 

In the present note, we discuss a certain type of dynamic 
regression model which together with less restrictive conditions, 
yields the lognormality of rainrate asymptotically. The model has 
a strong intuitive appeal and is quite flexible in that it re- 
quires only a few parameters which can be easily estimated from 
data. Using a novel estimation procedure, the model is fitted to 
the GATE (GARP-Global Atmoshperic Research Program-Atlantic 
Tropical Experiment) data. It is shown that some requirements for 
asymptotic lognormality are satisfied by the data. Furthermore 
realizations produced by the model appear to be very similar to 
those produced by real rainrate data. 

It should be emphasized that our result is model based and 
that by itself does not constitute a "proof" that rainrate is 
precisely lognormally distributed. We merely provide reasonable 
conditions which lead to lognormality, and indeed some of our 
conditions are well supported by the GATE data. It seems to us 
that the present approach is an improvement over the approach 
which solely relies on the law of proportionate effect. 


2. A STOCHASTIC MODEL FOR RAINRATE 

To unravel the lognormal mystery, we begin with a rather 
naive notion of a rain element. Conditional on rain, we conceive 
of a rain element as a small volume in space containing small 
droplets of water which have the following dynamics. Let time be 
discrete. At the n-1 time step, some droplets give rise to a 
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I 

I 


i 


1 

f 


new generation of droplets through a complicated physical process, 
some droplets leave the volume while new ones, called immigrants, 
arrive to join the folks of the new generation. It is really a 
process of replacement and immigration where the replacement 
refers to droplets already in the volume. The droplets are being 
replaced by a non-negative number of droplets where zero could 
mean complete departure or emigration. Thus at time n, the 
number of droplets in the volume in space is the sum of the 
replacement droplets and the immigrants. Let stand for the 

(random) number of droplets in the volume at time n-1 and sup- 
pose the ith droplet there is replaced by j fresh droplets 

while Ijj denotes the number of immigrants. Then at time n, 
the rain element contains 


X 


n-1 


^n = 2 

i=l 


^n,i + In' 


n = 1,2 


( 1 ) 


droplets with the convention that = 0. For (1) to cover dry 

1 

periods and shifts from dry (wet) to wet (dry) periods the follow- 
ing interpretation is adopted. Most of the time when it is not 
raining, the rain element is dry and both and vanish. 

The rain element becomes active as soon as 1^^ admits a positive 
value. This sets the X^^, and hence the in motion until 

the Xj^ vanish. The process restarts when admits again a 

positive value. can be thought of as the part of the process 

responsible for the occurrence of rain storms while 2 ^n,j Per- 
tains to the duration and amount of rain. 

The most important parameters associated with the dynamic 
model (1) are 

E Y_ . — m. Elf. — 1, n,i — 1,2... 

Il /X 

No further assumption is needed for the present use of the model 
except for A1 and A2 below. 


4 
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When the occurrences of rain are not too frequent, we expect 
\ to be small and close to zero. When it does rain, it usually 
persists for a while before it stops. This means that m should 
be close to 1 but still strictly less than 1. If m is greater 
than or equal to 1, the duration and amount can be explosive. 

Thus an indication of goodness of fit of (1) to rainrate data is 
small X, and m close to but smaller than unity . It is 
interesting to apply the model to real data to see if these 
conditions are met. 

When {Y_ {!_} are families of mutually independent non- 

li f ^ Xi 

negative integer valued random variables, the process is 

called a Galton-Watson Process with Immigration (Athreya and Ney 
1972, p.263). This type of process was introduced as early as 
1915 by Smoluchowski whose work is reported by Chandrasekhar 
(1943, chapter 3). Smoluchowski used the model to study the fluc- 
tuations in the number of particles contained in a small volume 
which exhibit random motion. However, we do not necessarily 
require the Y's and I's to be independent. 

There is a well known device which transforms (1) into a more 
convenient regression equation which takes into account past 
values of (Heyde and Seneta 1972, Winnicki 1986). Let 7 ^ 

be the a-field generated by the random variables 
(Xq,Xj, . . . ,Xj^) , and note that 

” *n-l + ^ 


Define by the difference 




and write (1) as 


Xn = ^n-1 + ^ "n 


( 2 ) 
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Then {X > is seen to be a stochastic difference equation where c 
n ** 

is a martingale difference (Lai and Wei 1982); i.e., is 

Tjj-measureable and E(c ) = 0 for every n. An important 
example is the case of independent with mean 0 which is not 

required here. Other than its formal importance as expressed in 
(2), martigale differences follows the Central Limit Theorem 
under quite general conditions. 

Since refers to the density of droplets in the rain ele- 

ment, it is related to the rainrate. But multiplication of (2) by 
a constant leaves the model intact and we can actually think of 
as representing rainrate. We therefore model rainrate 
dynamics by (2) where X^^ admits only non-negative values. 

3. CONTINUITY ASSUMPTION 

In its present form, equation (2) is a fairly general model 
which could represent a wide range of physical and statistical 
processes. In order to ensure the lognormality of X^^, some more 
assumptions are needed. 

Let {Xj^} n = 0,1,..., be the stochastic process (2) which 
stands for the rainrate process at a given rain element. Assume 
that the Xq,X^ ,X j , . . • , are readings at time 0, T, 2T,..., 
where the sampling interval T is small. The main assumption we 
shall adhere to is that of continuity: when the sampling interval 
T is sufficiently small we require that, conditional on rain, X^^ 
and be close to each other as is the case with many contin- 

uous phenomena in nature. This implies that the e^, the "errors”, 
are themselves small. For normality we also require the sum of 
squares of the to explode. More precisely, conditional on 

rain (i.e., positive assume 
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A1 : I X - ^i-1 1 

n 

A2: (1/n) 2 E[(£i/Xi_i)2|y^_j] — » c^>0, n — > oo 

i=l 

Since E(£^/Xj^_l = 0, and since by A1 ^ is 

essentially bounded as m —4 1 and X — > 0, it follows that 

(McLeish 1974, Basawa and Prakasa Rao 1980, p.388) 

n 

( l/>/n ) 2 ^ NCO.c^), n — ^ ®. 

i=l 

4. ASYMPTOTIC LOGNORMALITY OF RAINRATE 

Let ;t[A] be the indicator of the event A, and define 6^ 
by 

«n = K'^n-O/K-l + ^I^n -1 ' °0 

Then (2) can be written as 

X„ - (l+8„)Xn.i + In ^tVl ■ 

-n (i+s.)x<j + 5 n d+Sj)!,,.! atIx^.j-o] + i„ rlx„.i-o) (3) 

J = 1 

Thus, conditional on rain, it follows that 

^n “ U+5n) ^0 

from which we obtain by A1 that 
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n 

Iog(Xj^/XQ) ^ I (5) 

i=l 


or 


n ' . n 

log(Xj^/Xo) +2 [ d-ml-X/X^.^] =* 2 
i=l i=l 


Therefore for m sufficiently close to 1 and X close to 0, A1 
and A2 imply that for large n 



l//n 

N 



1+ ( 1-m) 


n 



i=l 



/n 


a(0,c^ ) 


(7) 


where a(0,c^) denotes the lognormal distriubtion with parameters 
0 and c^ (Aitchinson and Brown 1963). When m — > 1 and X — * 0, 
we obtain the useful approximation 



l//n 


A(0,C^) 


( 8 ) 


The 0 parameter is expected if we assume that X^^ for large n is 
independent of Xq and that the two are identically distributed. 
Under these conditions both and Xq'^” are asymptotically 

{fj, l/2c^ ) for some 


5. STATISTICAL ESTIMATION OF m and X 

A great deal of the foregoing discussion depends on m being 
close to but strictly smaller than 1, and X positive but close 
to 0. To verify these conditions, the parameters should be 
estimated as precisely as possible. Fortunately, this estimation 
problem is a special case of a general problem investigated in 
detail by Lai and Wei (1982) who give conditions under which the 
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least squares estimates converge almost surely (that is, with 
probability one. This is abbreviated a.s.) to the respective 
parameters. Winnicki (1986) has suggested that m and X should 
be estimated from the weighted model 


X 


n 


= m 


"n-l 




+ X 


/^1 


- — < 

+T ^ 


( 8 ) 


where c 
* 


n “ ^n // ^n-l"*"^ ' minimizing the sum of squares of 

the e^. The estimates obtained in this way are called weighted 
least squares and are shown, under some conditions, to be superior 
to the ordinary least squares when m is close to 1. Now, the 
Lai and Wei (1982) theory can be applied to the stochastic regres— 

in (8) is still a martingale difference. 


Sion model (8) since 
This is done next. 


n 


Denote the weighted least squares estimators by m, X and 
the design matrix by Then 


»n = 


^1 

«■ 

1 


/Xj-.! 

,X2 

1 

/-jjj+i 

/VI 

« 

’'n 

» 

1 


/VI 


Define a 2x2 matrix X by, A = Xn 

^max^*^^ be, respectively, the smaller and larger eigenvalues of 
Then the relevant result of Lai and Wei (1982) can be stated 
as follows, assuming model (8). Assume 


(a) 

and that 


sup E 
n 




n-1 


<00 a.s. for some a > 2, 
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(b) 


Then 


Xjn^jj(n) — > ® such that as n — > ® 

a.s. 


im,X) — > (m,X) 


a.s . 


Thus, when (a) and (b) are satisfied, the result guarantees a 
strong sense of convergence of the weighted least squares 
estimates. The estimates themselves are given in Winnicki (1986) 
as 


n n n 

2 2 - n 2 

“ = i=l i=l ^ ^ i=l 

i=l i=l 


(9) 


n 

2 

1=1 

n 

2 

1=1 


n 

Xi 2 
i=l 

^i-1 

ill 

Xi_i+l 

n 

2 

1=1 


2 

i=l ^ ^ 

- n^ 



where n is the series size. 

Since observed rainrate is finite, condition (a) is auto- 
matically satisfied. To verify condition (b) analytically is 
difficult in general but it can be verified from data. The rain- 
rate data we have in mind are described in the next section. For 
rainrate averages obtained from squares of 32 by 32 km^ at 15 minute 
intervals, the results from two different time series are given in 
Table 1. The series size ranges from n=100 to n*1700, and it 
is seen that condition (b) is satisfied since tends to 

infinity faster than log(Xjjj-^jj(n) ) . Similar results were obtained 
for other time series and so, for all practical purposes, the door 
is now open to the actual estimation of m,A. using these data. 
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Table 1 . Two cases for which condition (b) is satisfied. The 
rainrate series are sampled every 15 minutes over a 

square of 32x32 km^ 


n 


FIRST TIME SERIES 

^max^*'^ ^'^^min^ 

100 

6.368 

97.212 

0.719 

200 

53.972 

185.318 

0.097 

400 

108.249 

351.806 

0.054 

600 

142.773 

533.428 

0.044 

800 

151.526 

722.730 

0.043 

1000 

421.242 

901.590 

0.016 

1200 

438.366 

1084.106 

0.016 

1500 

475.987 

1359.853 

0.015 

1700 

514.737 

1540.269 

SECOND TIME SERIES 

0.014 

100 

66.853 

132.213 

0.073 

200 

117.430 

212.229 

0.046 

400 

224.335 

474.801 

0.027 

600 

322.439 

600.600 

0.020 

800 

389.040 

732.568 

0.017 

1000 

526.569 

945.093 

0.013 

1200 

576.146 

1094.382 

0.012 

1500 

706.008 

1363.871 

0.010 

1700 

729.100 

1537.641 

0.010 
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6. APPLICATION TO GATE DATA 

We applied the model to rainfall data collected during GATE. 
GATE was conducted in the Summer of 1974. During roughly three 
tri-weekly periods, detailed rainfall measurements from rain 
gauges and radars on an array of research vessels were made over 
an area called the B-Scale. The B-Scale encompasses an area of 
about 400 kms in diameter. Arkell and Hudlow (1977) composited 
the radar ship data and presented 15 minutes radar reflectivity 
scan data. Patterson et al . (1979) converted the radar reflec- 

tivity data to rainrates which are binned into 4 by 4 km^ pixels. 
This data set is probably as yet one of the most extensive rain- 
fall measurements made over the oceans. 

Time series of rainrate for individual pixels (4 by 4 kra^ 
resolution) and for area averages (10 by 10 pixels or 40 by 40 
km^ ) have been extracted from the first tri-weekly period in GATE 
(called Phase 1). The parameters of the model are estimated by 
the method of weighted least squares described above. Tables 2-5 
give the estimated m and X for 10 by 10 pixel arrays and for 
individual pixels situated at the center of the GATE area. 

The results for large area averages of 10 by 10 pixels are 
shown in table 2 and 3. For each 10 by 10 pixel array throughout 
the GATE area a time series was obtained from which m and X 
are estimated using (9) and (10). The estimated m are very 
close to but less than 1 except for some boundary points where 
there are missing data. At the four corners, there are no data at 
all in the 10 by 10 pixel array. The X field in table 3 shows 
small values except at the boundaries where again the problem of 
missing data is encountered. We see that for large area averages 
sampled (really visitedJ ) at T = 15 minute Intervals the results 
are very satisfactory and so a lognormal fit makes good sense. 
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For individual pixels (4 by 4 km^ ) m is still fairly large 
although not as close to 1 as in the 10 by 10 pixel array case, 
but \ is large as seen in tables 4 and 5 respectively. The 
reason for this can be attributed to the sampling interval of 15 
minutes; for smaller pixels we need to sample more often than 15 
minutes to achieve similar results. This suggests that the model 
approaches the lognormal limit for large aggregates at the 15 
minute sampling rate, and more generally, that there exists a time 
scale which corresponds to a spatial scale. This dependence of 
the model parameters on the averaging area can be seen very 
clearly from Figures 1 and 2 where m and X are given as a 
function of the pixel size (i.e. the averaging area) while the 
sampling interval is fixed at T = 15 minutes. The pixel sizes 
examined are 4x4, 8x8, 16x16, 24x24, 32x32, 40x40 and 352x352 km^ . 
We therefore conclude that lognormality of rainrate can already be 
observed fairly closely by averaging over pixels whose area is 
roughly as small as 40x40 km^ where the sampling frequency is 15 
minutes. This finding is enhanced by a histogram plot in Figure 3 
derived from about 60000 40x40 km^ GATE pixels. The figure dis- 
plays the distribution of the rainrate areal average on a log- 
arithmic scale. The distribution appears to be fairly symmetric 
in support of the above discussion. 


13 



D-14 


Table 2 . ESTIMATED m FOR 10 BY 10 PIXEL AVERAGES 

Each number represenis estimates for a lo by to pixel 
average. The total area covers the uhole of the CATE 


area. 

The 

four 

corners 

contains 

no data. 





.40 

.69 

.89 

.88 

.92 

.86 

.85 

.62 

— — 

,55 

.96 

.98 

.97 

.97 

.94 

.93 

.96 

.96 

.70 

94 

.95 

.96 

.98 

.97 

.92 

.97 

.96 

.97 

.95 

.94 

.95 

.96 

.98 

.99 

.98 

.98 

.98 

.98 

.97 

.94 

.97 

.98 

.98 

.99 

.99 

.99 

.99 

.99 

.98 

.96 

.97 

.97 

.98 

.98 

.99 

.99 

.99 

.99 

.98 

.95 

.97 

.98 

.98 

.98 

.99 

.99 

.99 

.99 

.97 

,97 

.97 

.98 

.98 

.99 

.99 

.99 

.98 

.98 

.96 

,79 

.97 

.98 

.99 

.98 

.98 

.99 

.98 

.98 

.81 



.97 

.98 

.98 

.99 

.99 

.99 

.99 

.99 

— 


Table 3 . ESTIMATED LAMBDA FOR 10 BY 10 PIXEL AVERAGES 

Each number represents estimates for a lo by lo pixel 
area average. The area covers the whole of the GATE 


area. 

Data 

are 

missing 

in the four 

corners of 

GATE. 


— 

.077 

.029 

.019 

.019 

.019 

.028 

.049 

.233 

— 

050 

.024 

.015 

.028 

.029 

.033 

.049 

.038 

.074 

.506 

031 

.025 

.045 

.047 

.047 

.089 

.053 

.068 

.066 

.086 

041 

.073 

.081 

.071 

.048 

.055 

.075 

.084 

.080 

.089 

072 

.094 

.098 

.080 

.057 

.059 

.065 

.074 

.076 

.094 

086 

.099 

.096 

.082 

.079 

.056 

.076 

.073 

.076 

.115 

134 

.092 

.101 

.079 

.071 

.061 

.069 

.086 

.083 

.157 

,109 

. 106 

.086 

.098 

.078 

.083 

.100 

.124 

.111 

.170 

,817 

. 125 

. 113 

. 109 

.114 

.106 

.108 

.140 

.1451 

.350 

1 

.124 

.212 

.143 

.123 

.114 

.164 

.1671 

.502 

— 
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Table 4 . ESTIMATED m FOR INDIVIDUAL PIXELS, 

£ach number represents the estimates for a 4 km by 4 km 
area average. The total area is 40 kms by 40 kms situ- 
ated at the center of the CATE area. 


91 

.93 

.92 

.93 

.90 

.92 

.87 

.88 

GO 

00 

.92 

87 

.91 

.90 

.91 

.93 

.91 

.92 

.87 

.89 

.88 

89 

.91 

.92 

.93 

.94 

.94 

.93 

.90 

.85 

.89 

91 

.90 

.86 

.85 

.91 

.93 

.94 

.94 

.92 

.91 

90 

.86 

. 83 

.88 

.92 

, .95 

.94 

.90 

.94 

.90 

93 

.90 

.83 

.88 

.89 

.94 

.91 

.92 

.92 

.94 

82 

.80 

.82 

.87 

.88 

.90 

.87 

.92 

.92 

.93 

88 

.83 

.82 

.89 

.90 

.85 

.87 

.92 

.91 

.90 

89 

.83 

.82 

.89 

.91 

.92 

.87 

.89 

.88 

.92 

90 

.86 

.91 

.91 

.93 

.89 

.86 

.86 

.91 

.91 


Table 5, ESTIMATED LAMBDA FOR INDIVIDUAL PIXELS. 


Each 

number 

represents 

the estimate 

for 

an individual 


pixel 

The 

total 

area 

covers 

an area of 

40 kms 

by 40 

kms 

situated at 

the center 

of the 

CATE 

area. 




86 

.40 

.57. 

.49 

.66 

.53 

.81 

.68 

.72 

.40 

50 

.38 

.57 

.52 

.43 

.57 

.48 

.88 

.75 

.60 

44 

.41 

.37 

.40 

.38 

.41 

.50 

.68 1 

.05 

.60 

84 

.39 

.53 

.51 

.40 

.37 

.38 

.42 

.56 

.54 

86 

.52 

.62 

.34 

.32 

.22 

.34 

.61 

.50 

.75 

28 

.39 

.57 

.38 

.39 

.21 

.37 

.50 

.53 

.45 

65 

.67 

.61 

.41 

.38 

.36 

.57 

.40 

.54 

.45 

81 

.50 

.47 

.37 

.33 

.57 

.53 

.36 

.50 

.61 

26 

.45 

.65 

.45 

.42 

.33 

.49 

.40 

.52 

.40 

24 

.52 

CO 

• 

.47 

.37 

.46 

.60 

.56 

.41 

.45 
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Figure 2 . The monotone decrease in X as a function of the pixel 
size. 
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7. SIMULATION 

We end this note with a short graphical comparison between 
time series from (1) and a typical time series from the GATE data. 
It should be noted that in the foregoing discussion we made no 
restrictions on the Y's and I's in (1) except for the requirements 
that they be non-negative integers. In fact (2) is a more general 
model since even this last restriction is removed. Thus, if (1) 
is capable of producing realizations which resemble real rainrate 
data, this shows all the more the adequacy of (2) which is the 
model we used all along in the foregoing discussion. 

Now, there are many ways to simulate (1). One simple and 
fast way is to take the Y's and I's as independent Poisson random 
variables with parameters m and X respectively. By this 
process we generated the time series in Figure 5. Figure 4 shows 
a typical time series from GATE which constitutes 100 hours. The 
sudden bursts of rain storms, duration, intensity, decay and inter 
arrival times between storms in the real and simulated 
realizations are quite intriguingly similar. 
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SUMMARY 


The puzzling experimental fact that rainrate tends to follow 
a lognormal distribution was explained with the aid of a model . 
Accordingly, under some conditions, as a rain storm develops, 
rainrate tends to follow a lognormal distribution. The conditions 
on the model parameters are shown to be satisfied fairly closely 
by the GATE data for time series which consist of rainrate 
averages over sufficiently large pixels observed every 15 minutes. 
A variant special case of the model is capable of producing reali- 
zations which appear to be very similar to real rainrate time 
series. Another fact is that the eigenvalue conditions necessary 
for the almost sure convergence of the weighted least squares 
estimates are well satisfied by the GATE data. In light of all 
these consistencies it is hoped that the model (2) can serve in 
settling other intriguing facts about rain. 


Acknowledgement ; The Authors wish to express their gratitude to J. 
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